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ABSTRACT 

This thesis investigates, using in-line simulation, the effect of non-deterministic 
runtime distributions on the performance of SmartNet’s schedule execution using the 
Opportunistic Load Balancing (OLB) Algorithm, the Limited Best Assignment (LBA) 
Algorithm, an O(mn^) Greedy Algorithm, and an 0{mn) Greedy Algorithm. Smart- 
Net is a framework for scheduling jobs and machines in a heterogeneous computing 
environment. Its major strength is its use of both current machine loads and pre- 
dicted job/machine performance when generating schedules. Schedules are built to 
meet various Quality of Service requirements using the above algorithms among others. 
We enhanced SmartNet’s simulator so that the runtime distributions could be used for 
experimentation. The distributions were generated using derivations from our study 
on NAS Benchmarks. Experiments were run for various categories of job/machine 
heterogeneity to compare the algorithms which account for both load and expected 
performance (the Greedy algorithms) against OLB and LBA. 

For all categories of heterogeneity, the greedy algorithms outperformed the 
other two algorithms for both truncated Gaussian and exponential distributions. For 
these same distributions, the 0{mn) Greedy algorithm performed as well as the 
O(mn^) Greedy algorithm when the heterogeneity of jobs and machines was high. 
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INTRODUCTION 



This thesis investigates the effect of non-deterministic run-times on the per- 
formance of jobs scheduled by SmartNet [Ref. 1, 2, 3, 4] in a heterogeneous computing 
environment. It has already been shown that if jobs are scheduled by SmartNet, and 
they run for exactly the expected amount of time, that the overall performance of the 
system is improved. SmartNet currently computes the expected run-time of a job by 
averaging previous run-times which it stores in its database after a job terminates. 
However, jobs rarely run for exactly this expected amount of time; even if a job is run 
repeatedly with exactly the same parameters, on exactly the same machine, run-times 
may differ due to memory stalls. Under less ideal conditions, when a job is using a 
data file located on a remote file server, run-time variations become even more pro- 
nounced. When the value of parameters are changed, the run-time can be drastically 
different. SmartNet attempts to account for parameter value changes using' a concept 
called “compute characteristics” [Ref. 5], but it will often be the case that, at any 
given time, at least one job will be running with some unidentified compute charac- 
teristics. Therefore, this thesis seeks to identify the expected performance of jobs in 
computing environments where there are changing or unknown compute characterist- 
ics. In particular, it focuses on the time of completion of the last job. It compares 
SmartNet i>erformance under these conditions against performance without Smart- 
Net. Specifically, it compares some of SmartNet’s intelligent algorithms, which use 
expected run-times, against another scheduling algorithm that does not use expected 
run-tiim‘s: Opportunistic Load Balancing (OLB). SmartNet’s intelligent algorithms 
hav(* be»Mi vliowii to outperform this algorithm when jobs do run for exactly their e.x- 
pected rim this thesis will document the comparison of SmartNet against this 

algorithm when the actual run-times of jobs are non-deterministic. 

To rehttr the r«*search in this thesis to other fields, we now present an example 
that (letnonst rail’s how we can convert parameters that are typically random and un- 
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controllable into more predictable and expected factors. The idea is to be able to 
exercise more control on the input to an algorithm that incorporates multiple para- 
meters, many of which may be environmental factors, so that the unpredictable nature 
of the algorithm’s output is lessened. To some degree, an algorithm can then be made 
more useful. 

A real world example of this situation is that of providing indirect fire. Mortars 
and artillery are indirect fire weapons. Indirect fire is the delivery of explosive ord- 
nance along a parabolic or near-parabolic path from the weapon to the target. This is 
different from the way rifles, pistols, and tanks deliver ordnance, which is along a line 
of sight path from the weapon to the target. The parabolic path of artillery allows 
ordnance to be delivered across great distances and over significant terrain such as 
hills. A parabolic path, however, allows more factors, many of them uncertain, to 
influence the outcome of an indirect round. It is the way that these uncertainties are 
accounted for that is the crux of our example. 

Figure 1 shows how indirect weapon fires might impact against a target. The 
nature of indirect fire causes impacts near the target to disperse mostly along the 
gun-target line but also somewhat left and right of that line. The resulting footprint 
is basically an elliptical pattern with the majority of the impacts lying near the center 
of the ellipse. This is because rounds fired indirectly are subject to the effects of wind, 
temperature, and the rotation of the earth. Because velocities of rounds are slower, the 
time of flight of a round is longer, and it is subject to effects not normally considered 
by a line of sight weapon system. There are also factors particular to the weapon 
system that can cause rounds to impact with limited precision, as shown in Figure 1. 
The temperature of the gun tube, the temperature of the powder used to fire the round 
from the tube, and the seat of the artillery round against the inside of the tube all 
effect whether the round is fired optimally. If a round is fired optimally, we expect 
that round to hit the target. If factors such as tube and powder temperature or the 
effects of wind at higher elevations are not considered in the solution, we expect the 
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round may miss. 

The artillery community strives to reduce the number of unknown variables 
present in indirect fire. There are parameters that are external to the artillery mech- 
anism that are major influences upon the outcome. These influences can be measured 
and their effect compensated for. The artillery community has taken a considerable 
amount of time and effort to understand, develop tools for measuring, and compensate 
for these influences. If consistent and timely measurements arc made and applied to 
the artillery solution, we can minimize the affects of outside influences and shoot “first 
round, on target’’ with impunity. 

It is the reduction of unknowns which is the eventual goal of this thesis. That 
is, this research strives to understand the external influences upon SmartNet that 
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might keep it from performing optimally and to determine how best to compensate 
for these influences. This thesis begins upon this problem by striving to understand 
the impact of unknowns upon SmartNet’s schedules. 



A. BACKGROUND INFORMATION 

Scheduling, in general, is a difficult problem [Ref. 6]. As an example, consider 
the task of scheduling troop and equipment movement from the United States to the 
east coast of Africa. We describe our example in terms of optimization theory. There 
are many factors that need to be considered in order to create a schedule for troop 
and equipment movement. One of the first and most obvious considerations is to 
determine the maximum possible movement rate of troops and equipment into the 
area. Only after this maximum movement rate is determined, can scheduling begin. 
The following additional factors must then be considered; 

• The mission commander will set priorities on units and equipment. He will 
also specify times at which units and equipment must arrive in the theater. 
The deadlines serve as scheduling constraints, whereas the priorities will be 
incorporated into the optimization function. 

• Certain pieces of equipment can only be transported by the largest aircraft or 
by ship. These additional constraints often result in higher transport time. 

• An additional example of constraints is the need for a Marine unit to arrive on 
foreign soil within 72 hours of an identified crisis. The footprint of a forward 
deployed unit will be small, and their sustainment capability limited to 30 
days. Deployment of this unit into the area of operations needs to be planned 
for; furthermore, the effect of placing a unit into the area of crisis on the will 
of tlie foreign force to wage war must be incorporated into the optimization 
criteria. 

• Unfortunately individual threats cannot be considered as local optimization 
problems. W’e have a large number of air transport assets that are committed 
giol>all\. whicli means that 100% of these assets can never be committed to a 
sinpl<’ l(K-al contingency. 

• Unfortunately, \ariables specific to location, such as airfield capacity, may need 
to !»<• sejiaralely modeled throughout the world. Although movement of troops 
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and equipment by air from the US is very flexible, the movement of troops and 
equipment into a foreign port or airfield may not be. 

• Time is another very important consideration. Time must be managed as 
effectively and efficiently as possible, and if possible, used to advantage. It 
takes time to match a contingency plan to the actual scenario, to start the 
plan, to actually follow the plan, and to revise and correct the plan. The 
amount of time a commander thinks he has to build up his forces will help him 
set his priorities for the arrival of equipment and units in theater. 

Unfortunately, a single schedule will not suffice. Many “what-if contingencies” 
need to be calculated; situations can change quickly and schedules must change to 
accommodate the dynamically changing environment. Being flexible and adaptable 
are hallmarks of success in any military operation. Constant updates of the current 
state of movement into the area are required to ensure that the schedule is still valid 
and effective. Planes and ships and trucks break down, weather changes for the worse, 
new regional contingencies pop up, and political pressures rise and fall. Schedules 
must be recalculated to take into account both opportune advantages and unexpected 
problems. It is the challenge of the scheduler to determine and properly analyze the 
current state of deployments and movement, as well as the causes of any changes. 
In summary, we cannot predict exactly how long any given transport operation will 
require, but we can often match the transport operation mean time and variance to a 
common probability distribution such as Gaussian or exponential. 

The creation of a movement schedule in the above example will also be limited 
by accurate state information. Acquiring total knowledge of an environment, and 
a complete understanding the interoperability of the assets in that environment, is 
a challenging problem. Scheduling decisions are, more often than not, made with 
limited, and often only “best guess” information. This type of decision making will 
onh' reach an optimal solution by accident; a scheduling tool that accounts for variance 
in transport times would be very useful to commanders in charge of these operations. 
This thesis will advance the state-of-the-art in heterogeneous schedulers that can, in 
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the future, not only be applicable to scheduling in computing environments, but also 
to the problem of scheduling troop movement. 

As we have hinted, our example above has direct correlation to a heterogeneous 
computing environment. In a heterogeneous computing environment, machines of 
different architectures are often linked together via a network. The machines may 
be located in the same room or on different continents, or aboard sea-going vessels 
or on satellites. The variety of architectures in the heterogeneous system provide 
capabilities above and beyond what you would find in an environment consisting only 
of machines with similar architecture. Below is an example that illustrates these 
additional capabilities. 

Consider the capabilities of the Single Instruction, Multiple Data (SIMD) ma- 
chine. 



SIMD machines (Single Instruction, Multiple Data) are an inexpensive 
way to construct parallel computer systems. A typical SIMD architecture is 
illustrated in Figure 2. 

A single front end controls the entire system; the front end fetches and 
decodes instructions. It includes (typically) a scalar processor core (usually a 
RISC machine), plus additional instructions to control the parallel processor 
ensemble. The front end usually has its own memory to hold the program and 
scalar data. 

The back end comprises many (up to thousands) processing elements 
(PEs). Each can perform arithmetic operations, memory fetches, and can send 
and receive messages. The systems essentially replicate the data path of a 
processor in each PE, but the control part of the processor resides only in the 
front end. This makes SIMD machines economical to design and build. 

When the front end issues a parallel instruction, it broadcasts the in- 
struction to all PEs, which all execute the instruction in parallel. Thus, a single 
instruction is performed on all data simultaneously. [Ref. 7, pages 746 — 747] 



The capability of a SIMD architecture is maximized, then, when used with 
programs that require the same instruction or set of instructions be performed on 
many different “pieces” of data. For example, SIMD machines manipulate matrices 
better than single processor machines. 
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Another machine that might be found in a heterogeneous system is a vector pro- 
cessing machine such as a CRAY. CRAY computers set the standard for high perform- 
ance vector super-computing, and are still utilized worldwide* when there is a need 
for enormous computational capability. The Y-MP EL, a CRAY mini-supercomputer, 
provides pipelining and segmentation, which are integral features of this architecture 
that support parallel processing aboard a single chip. Vector processing is provided 
to enable a programmer to sustain maximal I/O CPU throughput. Vector processing 
increases computing speed because the execution of single instruction can allow an 
operation to be performed sequentially on a set (or vector) of operands. [Ref. 8] 
This type of architecture is suitable for analyzing vectorized data, such as weather or 
satellite information. 

In order to maximize the use of a heterogeneous computing environment con- 
sisting of diverse architectures such as the CRAY and SIMD machines discussed above, 
knowledge of both the machines in the environment and the programs to be run on 
each machine are required. It may be a waste of compute power to run a job on a 
machine that is not best suited for the job. Such run-times could be large enough to 
retard productivity and efficiency even on a lightly loaded system. The problem is 
compounded on a heavily loaded system. Often, throughput maximization is a goal. 
Throughput maximization in a heterogeneous environment might mean optimal use 
of the resources, such that a minimal number of compute cycles are “wasted” doing 
work better suited to the capabilities of other architectures or machines. 

SmartNet is a scheduler that attempts to compute the best scheduling policy for 
tasks in a shared, heterogeneous computing environment. Such a situation is analogous 
to the previous troop and equipment movement example. The transport mechanisms 
are comparable to various machines in a heterogeneous system. Jobs needing to be 
run on a heterogeneous system are comparable to the units and equipment that need 
to be moved. The commander needs as much information as possible in order to create 

*CRAY machines are now manufactured by Silicon Graphics, Incorporated. 
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a near-optimal schedule. 

SmartNet is also analogous to the military logistical planner. SmartNet is dis- 
cussed in detail in Chapter II of this thesis. SmartNet is a scheduling framework for 
heterogeneous computing environments. It manages both jobs and machine resources 
in that environment. SmartNet manages these assets by creating a near-optimal sched- 
ule of jobs to be run on machines located on the network. SmartNet takes many factors 
into account, including the performance of jobs on the various architectures, the com- 
pute characteristics of a job, current machine loads, and the state of the heterogeneous 
system. [Ref. 1] 

B. STATEMENT OF PROBLEM 

Prior to our research, SmartNet had a rudimentary simulation mode that al- 
lowed its scheduling policies to be assessed without tying up the network and wasting 
valuable compute cycles on machines that may or may not be “owned” by the testing 
facility. The SmartNet simulator built a schedule from a set of requested jobs and 
a database containing information about jobs, machines, and job-machine pairs. In 
simulator mode, SmartNet then performed a discrete event simulation of the execution 
of the schedule. This previous SmartNet simulator uses the expected time to compute 
(ETC) value, which is the average run-time of the previous run-times of the job on 
the same machine, as the simulated run-time. The problem with using ETC values 
for run-times is that hardly, if ever, will a job execute for exactly the amount of time 
expected. The use of the ETC value for simulated run-time duration means that the 
SmartNet simulator does not produce realistic simulation results. 

C. GOAL 

The goal of this thesis is to investigate the effects that different run-time dis- 
tributions have on the performance of Smart.Net. We will enhance the SmartNet 
simulator to provide, cis the simulated run-time, a randomly generated run-time from 
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a reasonable run-time distribution for each job. This enhancement will enable us to in- 
vestigate the efficiency of schedules resulting from the different scheduling algorithms 
available in SmartNet under more realistic conditions. Simulations using our modi- 
fied simulator will contribute to an understanding of the value of SmartNet in less 
controlled environments, such as in the DOD’s Joint Task Force Advanced Techno- 
logy Demonstration (JTF-ATD) and Battlefield Awareness and Data Dissemination 
(B.ADD) programs. Additionally, although not part of this thesis work, such sched- 
ulers will likely become useful to commanders in the logistical scenario described in 
our above example. 

D. THESIS ORGANIZATION 

This thesis is organized as follows. Chapter II provides a detailed look at 
SmartNet. Chapter III is concerned with discrete event simulation as it pertains to 
SmartNet simulation mode. Chapter IV deals with the enhancements that we made 
to the SmartNet simulator. Chapter V details the experiments performed with the en- 
haiired simulator, as well as the results obtained from these experiments. Chapter VI 
Mimmarizes the conclusions drawn from these experiments and discusses further re- 
lated research opportunities. 
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FRONT END 




BACK END 




Figure 2. Single Instruction, Multiple Data (SIMD) Machine Architecture. The front 
end is a RISC chip with memory, used to control the back end. The back end is a 
matrix arrangement of relatively cheap processors. Each processor performs the same 
operation on different data, as directed by the front end. In this figure, only a small 
portion of the back end is shown. .Actual matrices of processors can be quite large, 
up to 32, 64, 128 processors or more. 
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II. 



SMARTNET 



A. INTRODUCTION 

This chapter describes SmartNet in considerable detail. Section B provides 
general information about SmartNet and why it was developed. Section C describes 
how SmartNet operates. Section D contains information about the architecture of 
SmartNet. Section E summarizes some previous results from the application of Smart- 
Net to scheduling problems. Finally, Section F provides examples of the Opportunistic 
Load Balancing, Limited Best Assignment, and other SmartNet scheduling algorithms. 

B. BACKGROUND INFORMATION 

SmartNet is a framework for scheduling resources in a heterogeneous comput- 
ing environment [Ref. 1]. It has been in development for over 10 years at the Naval 
Command, Control, and Ocean Surveillance Center (NCCOSC) Research, Develop- 
ment, Test and Evaluation (RDTE) Division, San Diego, California. The principle 
scientist is Richard Freund; however, the SmartNet Development Team consists of 
government employees and contractors working in various locations across the United 
States. The software currently contains over 100,000 lines of code, developed with 24 
staff- years of effort. 

The computing world is full of heterogeneous computing environments. They 
exist wherever machines with distinctly different architectures are networked. The 
machines may be connected for any number of reasons, but the environment that most 
demands a product with SmartNet ’s capabilities is an environment used to perform 
input-output intensive [Ref. 9] and/or compute intensive jobs [Ref. 1]. 

Current and future high performance computing (HPC) applications need in- 
creasing amounts of computing power. Because of this, there is an increasing focus 
on maximizing the productivity and efficiency of all available computing assets. In 
most HPC centers, local and remotely available computers comprise a heterogeneous 
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network. By allowing all of these assets to be utilized by a maximum number of 
applications, the connected assets in effect become a metacomputer. Figure 3 is a 
pictorial description of this concept. 




Figure 3. The Metacomputer Concept. Many HPC sites are connected to form a 
large, powerful, distributed virtual machine. 

Ongoing efforts within the research community include creating distributed 
computing environments (DCEs) in order to further maximize the potential compute 
jumer of these heterogeneous assets. Resource management systems (RMSs) have 
b<fn incorporated into existing computing environments with the goal of better man- 
aging the set of resources. DCEs and RMSs have fostered improvements in HPC, but 
still <lo not tackle the difficult problem of scheduling jobs and machines intelligently. 

Smart.N’et is capable of supplementing the efforts of DCEs and R.MSs to more 
fully maximize the compute capability in a heterogeneous computing environment. Its 
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focus is on optimizing a set of tasks instead of each task singly. [Ref. 10] 

While SmartNet is not the only advanced scheduling system under develop- 
ment, it does have features that distinguish it from other packages. Most scheduling 
efforts to date utilize Opportunistic Load Balancing (OLB) to develop scheduling solu- 
tions. OLB is a method by which jobs are scheduled based upon the current loads on 
the machines. If there is an open or unloaded machine, OLB schedules a job to run on 
that machine. Put simply, it is a form of “queue management,” whereby the queues are 
evenly loaded with no attention being paid to jobs already enqueued or the expected 
run-time of the same job on different machines (ETC). Another scheduling technique, 
which uses the ETC concept that was pioneered by SmartNet, is Limited Best As- 
signment (LB A). LB A considers one of the important parameters of scheduling, the 
expected performance of each job on the various architectures in the heterogeneous 
computing environment. LBA assigns each job to the machine upon which it is ex- 
pected to execute the fastest [Ref. 1], assuming (unrealistically) that no other job is 
using that machine. Both OLB and LBA consider only half of the information that is 
required for the creation of a near-optimal schedule. 

SmartNet considers both job performance and machine loads in its schedule 
creation. Armed with these two parameters, it develops a better schedule. Section F 
of this chapter provides examples of how a better schedule is generated using this 
information. 

C. SMARTNET’S PURPOSE 
1. Goal of SmartNet 

SmartNet is a scheduling framework for distributed, heterogeneous, high per- 
formance computing (IIPC). In this role, SmartNet strives to: 

• Maximize computing power, 

• IncrccLse the throughput of a set of jobs, 

• Optimize cost-effectiveness. 
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• Leverage existing resources, and 

• Ensure robust scheduling. 

In this context, the term “framework” means that SmartNet provides a mech- 
anism that can enhance the performance of existing systems, such as DCEs or RMSs. 
As a framework, SmartNet was also designed so that it can easily accommodate new 
scheduling criteria and heuristics. This makes SmartNet a viable tool for a majority 
of HPC sites, regardless of the type of task and resource management that is currently 
utilized at that site. SmartNet can be applied to nearly any environment where the 
dynamics of the scheduling problem require a near optimal solution. 

2. Functionality 

SmartNet is designed to allow a single administrator to manage the entire sys- 
tem. Users submit tasks to SmartNet. As tasks are received by the SmartNet server, 
they are placed into a database, a schedule is created or updated, and the tasks are 
run when the schedule indicates they should be. The database is a simple plain text 
file with a particular (and strict) format that is cached in memory when SmartNet 
is running. It is from this database that the server gets its job/machine estimated 
run-time (ETC values) information and to which the server adds new experiential in- 
formation. This database information is the source of information for the construction 
of the schedule. Given the job/machine ETC values in the database, the scheduling 
algorithms are applied to create a near-optimal schedule. The server initiates the 
schedule and tracks the behavior of all jobs throughout the entire run-time process. If 
a job runs longer than anticipated, it can be terminated or flagged. Such a “rogue job” 
might cause an e-mail message to be generated from SmartNet to the original taisking 
entity, letting that group or user know that something was wrong with their job. As 
jobs complete, experiential data is collected and saved into a database. As experien- 
tial data is gathered, “learning” occurs, and SmartNet changes compute characteristic 
and expected time to complete (ETC) data in the database [Ref. 2]. 
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D. SMARTNET ARCHITECTURE 
1. SmartNet Processes 



SmartNet is made up of several different processes, each with its own mission, 
yet relying upon messages to pass data between its processes. These processes include 
the Scheduler, the SmartNet Database, the Learning and Accounting Process, and the 
Controller. Messages exchanged consist of Requests, Control Information, and Data. 
Figure 4 depicts the relationships of these pieces. 




Figure 4. SmartNet Architecture, from (Ref. 2]. 



a. Interfaces 

There are two user interfaces, one for the user who is submitting a job 
to be run and one for the SmartNet system administrator who oversees the proper 
operation of SmartNet. Graphical and command line versions exist for each. Users 
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can set priorities for their jobs, but the system administrator has ultimate control [Ref. 
!]• 

6. The Controller 

The actual execution of jobs on resources may be controlled by any one 
of several facilities, including Resource Management Systems (RMSs), other versions 
of SmartNet, or Distributed Computing Environments (DCEs) [Ref. 2]. 

c. The Scheduler 

The SmartNet Scheduler contains both optimization and scheduling al- 
gorithms. There is a need for multiple algorithms because no polynomial algorithm 
optimally schedules for all environments. New schedulers can be added by the Smart- 
Net system administrator to take advantage of changing or unanticipated environ- 
ments. Optimization is key to the performance of SmartNet. SmartNet can imple- 
ment any number of optimization criteria, although only heuristics for maximizing the 
throughput by minimizing the completion time of the last job that finishes are present. 
Optimization criteria are what direct SmartNet to utilize specific search and schedul- 
ing algorithms. The algorithms built into the SmartNet scheduler are discussed in 
Section 2 [Ref. 2]. 

d. The Database 

The SmartNet database is an ASCII text file containing information 
about sites, groups, machines, models (jobs), and model-machine pairs. The database 
can be built or edited by hand, but the SmartNet Editor is a good tool to use, as it 
forces the administrator to input required data and writes the database in the proper 
format. Smart N('t is not forgiving of improper formatting. As the database is parsed, 
data is «-valiiated and placed into objects commensurate with the order of data in 
the file [Hef 10[. .Appendix A shows the fields of the database and the information 
contained therein. Of particular importance is the expected time for completion (ETC) 
field in the niod<*I-machine listings. This ETC data is what SmartNet uses to create 
a schedule. The finish times of jobs must be either estimated by the programmer or 
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collected by SmartNet over the course of several runs in order for SmartNet to create 
anything close to a near-optimal schedule. Chapter IV contains detailed information 
about the changes that we made to this database, and to routines that read and write 
to the database, in order to perform our experiments. 

e. The Learning and Accounting Process 

Presently, SmartNet’s algorithms for learning and accounting are rudi- 
mentary. The framework exists, though, to permit easy integration of additional 
algorithms. As we mentioned in Section 2, rogue processes are tracked and reported. 
The action taken upon discovering a rogue process is specified by the user or system 
administrator at startup. Another form of learning and accounting that occurs is the 
gathering of experiential data after job completion. SmartNet will collect run-time 
statistics and write them out to the database file, making use of the information later 
during the scheduling and execution of similar jobs. [Ref. 1] 

/. The Controller 

The Controller enters the picture when jobs terminate, jobs become 
rogue processes, new job requests are input, and when machines or networks go down. 
All of the above events may cause SmartNet to create a new schedule or re-start certain 
uncompleted jobs. The controller is designed to allow SmartNet to: 

• allow redundancy in critical environments, 

• operate in environments where resource availability is not guaranteed, 

• be integrated with an RMS and provide scheduling assistance to that RMS, 
and 

• coordinate the efforts of multiple RMSs [Ref. 1]. 

2. SmartNet Algorithms 

SmartNet uses a number of algorithms to create a schedule. The general char- 
acteristics of these algorithms are discussed below. 
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a. Exhaustive Algorithm 

An Exhaustive Algorithm provides a “brute force” solution to the schedul- 
ing problem. Every possible data combination is generated and compared. Because 
this scheduling problem is NP-complete, this algorithm, that produces an optimal 
result, can only be used with very small data sets [Ref. 6]. 

b. Greedy Algorithms 

Greedy Algorithms make the best local choice available at a specific 
point in the search tree [Ref. 6, pages 329-336]. For instance, if a Greedy algorithm 
is to choose the cheapest candy, and is searching a row of candy including a 75 cent 
Milky Way, a 55 cent Almond Joy, and a 35 cent package of Trident, it will choose 
the Trident over the other two. This appears to be an optimal solution; however, it is 
an optimal choice^ based upon the candy considered at that point in the search tree. 
It is a best local choice. If a twenty cent box of Tic-Tacs lies on another row, it is 
the cheapest candy, and so the true optimal choice. Whether or not this decision aids 
in the production of an optimal solution depends upon the parameters of the entire 
problem. Since the Greedy Algorithms look for the best choice at some point in the 
search tree, complete consideration of the effects of the choice upon the end result 
are not made. Greedy algorithms are deterministic and produce only near-optimal 
results. SmartNet uses both an 0{mn) algorithm, which we call Fast Greedy, and an 
O(mn^) Greedy algorithm. 

c. Evolutionary 

Hartmut Pohlheim presents a fine explanation of evolutionary algorithms, 
portions of which are included here. 

Evolutionary algorithms are stochastic search methods that mimic the 
metaphor of natural biological evolution. Evolutionary algorithms operate on a 
population of potential solutions applying the principle of survival of the fittest 
to produce better and better approximations to a solution. .<\t each generation, 
a new set of approximations is created by the process of selecting individuals 
according to their level of fitness in the problem domain and breeding them 
together using operators borrowed from natural genetics. This process leads 
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to the evolution of populations of individuals that are better suited to their 
environment than the individuals that they were created from, just as in natural 
adaptation. 

[I]t can be seen that evolutionary algorithms differ substantially from 
more traditional search and optimization methods. The most significant dif- 
ferences are: 

• Evolutionary algorithms search a population of points in parallel, not a 
single point. 

• Evolutionary algorithms do not require derivative information or other aux- 
iliary knowledge; only the objective function and corresponding fitness 
levels influence the directions of search. 

• Evolutionary algorithms use probabilistic transition rules, not deterministic 
ones. 

• Evolutionary algorithms are generally more straightforward to apply. 

• Evolutionary algorithms can provide a number of potential solutions to a 
given problem. The final choice is left to the user. (Thus, in cases where 
the particular problem does not have one individual solution, for example a 
family of pareto-optimal solutions, as in the case of multi-objective optim- 
ization and scheduling problems, then the evolutionary algorithm is poten- 
tially useful for identifying these alternative solutions simultaneously.) [Ref. 
11 ] 



d. Simulated Annealing 

Simulated annealing is a stochastic optimization method useful for find- 
ing global minimum cost configurations of NP-complete combinatorial problems with 
cost functions having many local minima [Ref. 12]. 

Simulated annealing builds on an analogy between the way metals con- 
tract with decreasing temperature into a minimum energy crystalline structure and 
the way searches for a minimum can be performed. After metal is heated and manip- 
ulated, it must be cooled. The best way to cool metals is to do it slowly. This allows 
the molecular makeup of the metal to slowly contract and “settle'' upon itself which 
reduces the probability of cracks, “bubble.s". and otherwise weak bonds throughout 
the entire mass of the metal structure. If metal is heated and then cooled very quickly, 
the contraction of the molecular structure tends to settle into local minima rather than 
to contract into a more stable, true minima. The metallurgic process of annealing then 
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compares to stochastic optimization methods like this: The heated metal is the ran- 
dom state that needs to be reduced to some sort of minima. In SmartNet, this would 
be the minimum time for completion of all jobs being scheduled. The temperature is 
a parameter that governs the probability of increasing the cost function at any step in 
the search for the global minima [Ref. 12]. 

The simulated annealing algorithm requires a valid solution space, a way 
to randomly move about in the solution space, a method for evaluating cost functions, 
and an annealing schedule. The annealing schedule includes the initial “temperature” 
variant and rules for decreasing that temperature throughout the search process. [Ref. 
12 ] 

Simulated annealing has several advantages. Specifically, simulated an- 
nealing; 

• can deal with arbitrary systems and cost functions, 

• statistically guarantees finding a near-optimal solution, 

• is relatively easy to code, even for complex problems, and 

• generally produces “good” solutions. 

This makes simulated annealing an attractive, but computationally expensive, option 
for optimization problems where heuristic (specialized or problem specific) methods 
are not available. [Ref. 12] 

e. Future Efforts 

As SmartNet is still a work in progress, there are continual efforts to 
develop better performing algorithms. 

E. SMARTNET PERFORMANCE 

Previous work with SmartNet, detailed in [Ref. 1], provides the following 
information concerning schedules generated by SmartNet. 

The performance data shown in Tables I and II was developed from several 
scheduling problems run on SmartNet in simulation mode. The scheduling problems 
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varied in both the number of jobs being scheduled and the number of machines avail- 
able, as well as the amount of heterogeneity. The number of jobs and machines varied 
for each problem, but was always somewhere between two and 1000 jobs and two and 
500 machines. The two modes of heterogeneity used were: 

• Consistent Architectures. Given a set of machines, if one job runs faster on a 
particular machine, then all jobs will run faster on that particular machine. 

• Mixed Architectures. Given a set of machines, one job running faster on 
a particular machine has no bearing on how other jobs might run on that 
particular machine. No generalizations about the performance of all the jobs 
on these machines can be deduced. 

The algorithms were judged on how well they minimized the last job’s completion 
time. Knowing that finding an optimal schedule is an NP-complete problem [Ref. 1], 
the baseline used for comparison was derived from a lower-bound algorithm. This 
algorithm does not produce a valid schedule, but does obtain a time known to be less 
than the time at which the last job will complete. 

Table I provides average time of completion of the last job in a schedule for a 
variety of architectures and algorithms. The numbers represent time, and show that 
the schedule produced with a SmartNet Greedy algorithm (MinMin) is better than 
either the OLB or LBA generated schedules. 





Scalable arch. 


Arch. Mix 


Arch. Mix 


jobs/machines 


500/100 


500/100 


1000/500 


LBA 


100 


86.2 


422.6 


OLB 


5.47 


4.01 


7.33 


MinMin (SmartNet) 


3.78 


3.14 


4.01 



Table I. SmartNet Performance: Average values for the time t at which the last job 
in a schedule completes. 



Table II shows OLB, LBA, and SmartNet’s Greedy algorithms’ performance 
relative to a lower bound. After normalizing to the lower bound, the table shows 
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that given a 500/100 job/machine ratio on mixed architectures, the SmartNet Greedy 
algorithms completes six percent slower than the best possible time. OLB completes 
28% slower than this time. LBA, on the other hand, completes 2,650% slower than 
this time. 





Scalable arch. 


Arch. Mix 


Arch. Mix 


jobs/machines 


500/100 


500/100 


1000/500 


LBA 


26.5 


27.5 


105.5 


OLB 


1.45 


1.28 


1.83 


MinMin (SmartNet) 


1.13 


1.06 


1.29 



Table II. SmartNet Performance: Average values of t compared to our lower bound. 
t is the time at which the last job in a schedule completes. Our lower bound is 
represented as 1.00. This table shows that when SmartNet schedules 500 jobs on 100 
mixed-architecture machines, the schedule is completed in six percent more time than 
our lower bound. From [Ref. 1]. 



F. EXAMPLES 

These examples help explain both how SmartNet works and how a knowledge 
of both machine load and anticipated job performance can create a better schedule. 

We consider the following scenario: There are three machines, Machine-A, 
Machine-B, and Machine-C. Each machine is of a different architectural design 
(SIMD, MIMD, and Vector, respectively). There are four jobs, Jobl, Job2, Job3, 
and Job4. each with different compute characteristics. Table III provides ETC values 
for the job-machine pairs. 

1. Example 1: Opportunistic Load Balancing 

OLH iv a iiK'thod by which jobs are scheduled based upon the current loads on 
the ma< bines Figure .5 shows one possibility of how an OLB scheduler might schedule 
jobs to run on several machines. In this scenario, the OLB algorithm places the next 
job in the rpieiie of the next available machine. If the jobs are ordered in the queue 
according to increasing job ID order, and if machines become available in the order 
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Machine-A 

SIMD 


Machine-B 

MIMD 


Machine-C 

Vector 


Jobl 


33 


5 


22 


Job2 


2 


49 


56 


Jobs 


13 


12 


17 


Job4 


15 


3 


9 



Table III. Job Run-times used in all examples. 



Machine-A, Machine-C, Machine-B, and Machine-B, the jobs will be scheduled as in 
Figure 5. We note that the time of completion for all jobs is 56. 



Machine A 



Machine B 



Machine C 




2. Example 2: Limited Best Assignment 

LBA schedulers assign jobs to machines based upon the expected job’s per- 
formance on each of the machines. In other words, the jobs are assigned to the 
machines upon which they should perform the best (i.e., have the shortest expected 
run-time) [Ref. 1]. We note that this algorithm assumes that each job that it schedules 
is the only job in the system. Again, Table III provides the expected run-time data 
used in this example. 
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Machine A 




Figure 6 shows how an LBA scheduler would schedule the four jobs on the 
three machines. We note here that the expected time of completion for all jobs is 20. 

3. Example 3: Greedy Algorithm 

This example uses a Greedy Algorithm. This algorithm takes into account 
botli machine loads (like OLB) and run-time performance (like LBA) to produce a 
near optimal schedule. Again, Table III provides the expected run-time data used in 
this example. 

Figure 7 provides a SmartNet schedule for the Table III data. Here, the earliest 
ex|>ected run-time completion for all jobs is 15. This is significantly better than either 
th<' OLB or LBA schedulers from Examples 1 and 2. 
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Machine A 



Machine B 



Machine C 
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III. 



DISCRETE EVENT SIMULATION 



A. INTRODUCTION 

This chapter explains discrete event simulation. Section B provides background 
information concerning simulation in general and explains why discrete event simula- 
tion is a useful tool. Section C describes discrete event simulation in detail. Random 
variates are explained in Section D. Section E presents concluding remarks. 

B. BACKGROUND INFORMATION 

The desire to predict the performance of a system has led to the need to study 
both the system’s performance and behavior. This desire is the driving force behind 
much academic and industrial research. In this context, a system might be: 

• an actual mechanical entity, such as an automobile or a building, 

• some measurable non-mechanical entity, such as a hurricane or an ecosystem, 
or 

• a process or sequence of events involving both human and mechanical functions 
similar to the logistic example posed in Chapter I. 

One characteristic common to the types of systems listed above is that they 
possess measurable parameters that influence their behaviors. For example, an auto- 
mobile has the variable parameters velocity and acceleration, as well as the constant 
parameters weight, mass, and coefficient of friction. Performance of an automobile 
is affected b\ all of the above parameters. Parameters may be restricted to a des- 
ignated raii>:e. A study of an automobile’s performance would utilize these variable 
and rotivtaiit parameters, cis well as any restrictions in effect, and provide perform- 
ance pretlic tioii'> specific to the input parameters. Such a study would be helpful in 
determining how an automobile might perform, given modifications to its weight or 
coeffici<*nt of friction. There are several methods available to study this or any system. 
While, ill this Cicse. the most obvious would be to study an actual automobile, there 
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are severe limitations to this method. It would be difficult, if not impossible, to make 
adjustments to the coefficient of friction without altering the shape of the automobile. 
Changing the shape of an automobile is difficult. The need to change the coefficient 
of friction, for example, limits the utility of experimenting on the automobile itself. 
In this case, and for many other types of systems, it is probably easier to construct a 
model. Figure 8 shows the different ways systems can be studied. 




Figure 8. Ways to study a s^'stem, from [Ref. 13, page 4]. 



A model of a system can be constructed either mathematically or physically. 
Depending upon the complexity of the system, both can be difficult. There are obvious 
limitations and difficulties associated with constructing a physical model of a logistic 
system used to move troops and equipment from the United States to a foreign area 
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of operations*. Physical modeling would involve scaling a global problem down to a 
manageable size. In a high fidelity physical model, every physical feature of the logistic 
operation might be physically rendered. Physical features requiring duplication in this 
case would include the loading of ships and aircraft, troop movements, and airfield 
operations. The difficulty in making such a model accurate is obvious. Physical 
modeling to a reduced scale also introduces inaccuracy in many areas, not the least 
of which is the non-linearity of design characteristics between full and reduced size 
entities. 

An alternate approach to physical modeling is mathematical modeling. Any 
physical system can be reduced to a mathematical model that represents those aspects 
of the system that the modeler desires to measure and control. In our logistic example, 
the loading of aircraft can be mathematically modeled as taking a deterministic amount 
of time dependent only on the type of cargo being loaded. Transit time can be modeled 
also as a deterministic amount of time, perhaps by using the average of historical data. 
Actual cargo can be modeled using its weight, mass, and measurement parameters and 
considered a “puzzle piece” to be moved, shifted, and transported in accordance with 
the priorities provided by the force commander. In general, a mathematical model 
is an order of magnitude less expensive model to produce than the physical model. 
Additionally, the designer can easily modify the fidelity of the various aspects of the 
system that are deemed important. 

There are two methods for studying mathematical models; analysis and simula- 
tion. The analytical approach to studying a model requires the solution of mathemat- 
ical equations. If the system being modeled is complex, though, it may be impossible 
to develop mathematical equations that consider the combined effects of every in- 
terrelated or critical piece of the model. Increasing the accuracy, or fidelity, of the 
model may require very complex mathematical equations. As an example, we consider 
modeling, in great detail, the logistic example from Chapter 1. 

^See example provided in Chapter I. 
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To model a single fork lift, we would need to mathematically represent such 
things as the mean time between failure of the engine, the fork lift mechanism, and 
the tires. Also, rate of failure of the operator, driving speed, lifting speed, haul rates, 
machine-to-task suitability, fuel consumption rate, and maintenance schedules would 
need to be modeled. As we see, there are numerous details in modeling a single fork 
lift, and the fork lift itself is only a single, small part, of the entire logistic system. 
There may be four different types of fork lifts at a single airport, and a total of forty 
fork lifts, altogether. The complexity of modeling forty fork lifts is greater than one 
fork lift, but even if modeling them is easy, forty fork lifts as a whole are still a small 
but vital piece of the logistic system. Further, more complex pieces of the logistic 
system would need to be included in the model. 

• Fuel. There are some finite number of refueling trucks, as well as a finite 
amount of jet fuel. The delivery of fuel to the airfield, the process of refueling 
aircraft, and the performance characteristics of the personnel and machines 
involved in the entire refueling process would need to be modeled. 

• Scheduling. Scheduling is an NP-complete problem. The airfield has a max- 
imum physical capacity. The airfield also has a maximum workload under 
which it can operate. Every asset at the airfield needs to be scheduled so that 
the process of getting personnel and equipment onto aircraft and subsequently 
overseas works in accordance with the intent of the commander. Introducing 
scheduling into an analytical model may make it too complex to find a closed 
form solution. 

• The Human Factor. In every environment where people are working under 
stressful conditions, accidents occur. When medium and largo scale machinery 
are present, severe accidents are possible. Accident and injury rates must 
bo modeled. Further, the consequences of these same accidents and injuries 
must also be modeled. For example, we consider the effect tliat the following 
scenario might have on the operation of an airfield: A Heavy fork lift operator 
is loading an extremely large metal storage container on a ('-•'» cargo plane, 
riio is also being refueled. The fork lift operator has a heart attack and 
loses control of the fork lift. The fork lift drives the storage container through 
the side of the C-5, wrecking the jet’s extensive hydraulic system. The refueler 
operator, seeing the situation, performs an emergency disengage of the refueler 
from the aircraft. His refueler dumps 500 pounds of highly flammable jet fuel 
on the tarmac. We see that such scenarios, when modeled with great fidelity, 



30 



are mathematically very complex. The individual effects may be easy to model; 
however, the comprehensive effect of the individual events may not be easy to 
model. 

If, when using the analytical approach, very realistic assumptions and high 
fidelity are required, closed form solutions may be impossible, forcing the mathematical 
modeler to make simplifying assumptions that can cause the results to be useless. 
Suppose that the probability of a devastating accident involving a C-5 aircraft on an the 
ground during refueling is 0.0001. Further, it is known that the probability distribution 
is Gamma(0.0001, 15). If an accident of this type occurs, the airfields cycle rate of 
aircraft is decreased by 10%. The Gamma distribution does not have a closed form 
with these parameters. The mathematical modeler might choose to represent the 
probability of this event occurring, then, with an exponential distribution, because it 
has similar characteristics to the gamma distribution, and the exponential distribution 
and its inverse are both closed form expressions. Because of the need to simplify the 
mathematical model, the model no longer provides the desired accuracy, which may 
result in incorrect performance estimates. 

An important part of modeling is simplification. Simplification is a method of 
reducing or removing specific complex factors which can be accounted for by other 
means. Using the fork lift example above, if, in reality, the fork lift breaks once every 
10,000 hours, the modeler may be able to assume that the fork lift will not break. 
Ample consideration must be given to the possibility of skewing the results obtained 
from the model because of poor simplifying assumptions. If the fork lift actually 
breaks once every 10 hours, that factor would probably need to be included due to 
the frequency of occurrence. 

A simulation, executed on a computer, also uses a n)athematical model. When 
building simulations, it is easy to increase the fidelity of certain aspects of the system 
while decreasing the fidelity of others. We again consider the fork lift discussed above. 
A simulation model of a fork lift may not need to model fine details such as the mean 
time between failures, fuel consumption, lifting speed, and maintenance schedules. It 



31 



may make sense to consider all of these factors as one and model the work performed 
per hour. Such a simplification would reduce the complexity of the model, and might 
make it easier to evaluate. Simulation models are evaluated via their state variables. 
State variables are those parameters that are required to describe the model (and so, 
the system) at a particular point in time. 

Simulation models can be classified along three dimensions: 

• Static versus Dynamic. A static model is a snapshot of a system at a particular 
time, while a dynamic model is evolutionary. 

• Deterministic versus Stochastic. A deterministic model has no random com- 
ponents. Output is a deterministic function of input. A stochastic model is, in 
contrast, non-deterministic. 

• Continuous versus Discrete Time. A continuous time model is one in which the 
state variables change continuously over time. A discrete time model is one for 
which the state variables change instantaneously at separate (discrete) points 
in time. 

• Continuous versus Discrete States. A continuous state model is one in which 
the values of the state variables can take on any of a defined range of values. 
A discrete state model is one in which the values of the state variables are 
restricted to a subset of acceptable values. 

The type of simulation used to provide results in this thesis is static, stochastic, 
and discrete in nature. This type of simulation is commonly called discrete event 
simulation. 



C. DISCRETE EVENT SIMULATION 
1. Overview 

Discrete event simulation models a system’s activity as it progresses through 
time. The operation of a system can be thought of as a collection of events that make 
up the system’s activity. An event is “any instantaneous occurrence that may change 
the state of the system.” [Ref. 13, page 7] Events occur at different times, and are 
stamped with the time at which they occur. The state of the system is, informally. 
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its current condition. System state is defined by system specific state variables that 
describe the system’s condition [Ref. 13, page 81]. As events that are to occur in the 
future are generated as a byproduct of simulating a current event, they are stored in 
an event queue, where they stay until the simulation clock advances to the time of their 
occurrence. Events in event queues are often ordered according to the simulation time 
at which they are to occur. As the discrete event simulation progresses, individual 
events are taken out of the event queue and processed. When an event is removed 
from the event queue for processing, the simulation clock is advanced to the time 
stamp on that event. 

Discrete event simulation characteristically requires three sets of variables. 

• Time variable t. t is used to track elapsed simulation time and is also called 
the simulation clock. 

• Counter variables. These are used to track repetitions of certain events and 
the time that they occur. 

• System state variables. These are model/system dependent; they describe the 
state of the system at any given time [Ref. 14, page 81]. 

The advancement of time in discrete event simulation can be a difficult concept 
to understand. The elapsed simulation time and the actual time required to run a 
simulation are usually different. The time required to run a simulation may be greater 
or less than the elapsed simulation time, and is dependent upon the particular model. 
An example of a model where simulation time would probably be greater than real 
time is in the simulation of subatomic particle movement. An example of a simulation 
that would probably require less time than real time is simulation of continental drift. 
Advancement of the simulation clock is usually done via one of two methods: 

• Next-Event time advance. Time is advanced whenever an event occurs. 

• Fixed-Increment time advance. Time is advanced at fixed intervals. 

Next-Event time advance is the most prevalent method [Ref. 13. pages 7 — 9]. 

Figure 9 depicts the flow of control for a next-time advance discrete model. 
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Figure 9. Flow of control in Discrete Event Simulation, from [Ref. 13, page 12], 

2. An Example of Discrete Event Simulation 

Discrete event simulation can be applied to the logistic system described in 
Chapter 1. The mission to be accomplished, using the logistic system, is the efficient 
movement of troops and supplies from various locations throughout the United States 
and other allied nations to some foreign area of operation. This system provides 
numerous ♦•.xamples of the difficulties found when building a near optimal schedule 
for the use of logistic assets. It is also a good system to demonstrate the utility and 
suitability <»f discrete event simulation. Of particular note, however, is the difficulty 
of modeling any system this complex and large. Akin to this difficulty is the need for 
specific problem statements. In other words, we need to know what we are modeling 
and why. It is often infeasible to model every aspect of such a system with great 
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fidelity, as the size of the system, including dependencies between subsystems, would 
be too complex. 




Air Transport Flow Into and Out Of 
The Area of Operation 

Figure 10. Logistic Example; Air transport assets into and out of Somalia. 



.All important factor in the success of a logistic system is the capability, per- 
furmarire. and scheduling of air transport assets. Whenever U.S. forces deploy to 
foreign soil for both peace keeping and combat missions, multiple plans for troop and 
«‘<|uipment build-up in that area are developed. The plans inrlmh- rosters of units 
(tr»K)ps and equipment) that will be deployed and schedules designating when the 
units are to arrive. The deployment of forces can take from sev<*ral days to several 
niontlis in order to reach the force structure needed to fulfill the re<piirenients of the 
mission. The theater commander will be very concerned about reaching his desired 
in-theater force structure, as it will drive his ability to begin, continue, and complete 
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the mission. The logisticians must plan the movement of assets into the area of opera- 
tions as efficiently and effectively as possible to allow the theater commander to mass 
his forces appropriately. Transportation of equipment and troops by air can help meet 
initial force build-up requirements both efficiently and quickly. 

The commander’s desires specific to air transport scheduling and availability 
can be simulated using discrete event simulation. Two important questions that the 
simulation must answer for the commander are “How long will it take for my 
forces and their equipment to be transported into the area of operations?” 
and “Given the planned scenarios, which one most rapidly places the major- 
ity of my fighting forces and their equipment on the ground?” One approach 
to answering the commander’s questions is to build a computer model and simulate 
the movement of each force structure into the area of operation, and report the length 
of time required. The goal is to use the simulation as one of the many tools available 
to the commander. 

Discrete event simulation has direct application to modeling the flow of aircraft 
into an area of operations. We consider the following pseudo-algorithm: 

• loop begins 

1. Aircraft[aa] diTTives &t fromUSAirfield[bb] 

2. Aircraft[aa] is ready to be unloaded 

3. Aircraft[aa] is ready to be loaded 

4. Aircraft[aa] is loaded 

5. Aircraft[aa] departs airfield[bb] for ARE A.OF .OPERATION S 

6. AircTaft\aa\ arrives at AREA.OF.OPERATIONS 

7. Aircraft[aa] is ready to be unloaded 

8. Air era ft{aa\ is ready to be loaded 

9. y4frcra/f[aa] is loaded 

10. AircTaft[aa] departs AREA.OF.OPERATIONS for toUSAirfield[cc] 

• loop ends 
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The above list enumerates several events that a discrete simulation of the sys- 
tem might incorporate. The dynamics of this problem dictate that the above ten 
events must occur at some point, and in the stated order, during every round trip 
flight of an aircraft {Aircraft[aa]) from the United States {fromUSAirfield[bb]) 
to a foreign airfield {ARE AJDF -OPERATIONS) and back to the United States 
{toU SAirfield[cc\). 

The “discrete event” aspect of the simulation refers to the time interval between 
specific events. The amount of time advanced is dependent upon what is going on in 
between the two events. While a detailed discussion of a discrete event simulation for 
the above example is beyond the intent of this section, an explanation of what occurs 
between two of the events will suffice. We consider the events in lines 1 and 2 above: 

1. Aircraft\aa] arrives at fromUSAirfield[bb] 

2. Aircraft[aa] is ready to be unloaded 

Event 1 is labeled with the time {SimulateddimeA) that an aircraft arrives at a U.S. 
airfield. Event 2 is labeled with the time {SimulateddimeJl) that the same aircraft 
is ready to be unloaded. The duration between event 1 and event 2, in reality, is 
determined by the amount of time the aircraft is idle on the ground, which is effected 
by the number of other aircraft already on the ground as well as the rate at which 
those aircraft can be unloaded. The duration between events 1 and 2 in the simulation 
is either deterministic or stochastic. DeltaT represents the time required to unload 
the aircraft. The advanced time function might proceed as follows. 

1. SIMULATION CLOCK = Simulated Jimt-X 

2. Simulated lime A = SimulatedJime.X + DdtaT 

3. SIMULATION CLOCK = SimulatcdJitm A 

In our example, DeltaT is determined by a distribution that is based upon observed 
data. If an aircraft must always wait the same amount of time before being unloaded 
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after it arrives at the airfield, a constant could be used for DeltaT. If the amount 
of time that an aircraft must wait to be unloaded after it lands at the airfield is not 
fixed, the probabilistic nature of that duration must be recreated for the simulation. 
Recreation of this random process requires the following: 

• The identification of the mathematical distribution that matches the distribu- 
tion of times that the aircraft must wait to be unloaded. 

• The generation of a random variate^, DeltaT, from the mathematical distribu- 
tion previously identified. 

The strength of discrete event simulation is evident when the simulation is 
actually performed. Actually loading and unloading the aircraft may require several 
days. However, because discrete event simulation instantaneously advances simulated 
time to the time of the next event, the simulation may only require several seconds. 
The S I MU LATION CLOCK is advanced at each event by the appropriate real 
world DeltaT, and the simulation terminates with realistic results in significantly less 
time than the actual sequence of events. 

D. RANDOM VARIATES 

The very nature of discrete event simulation requires it to incorporate stochastic 
processes to account for the inherent randomness in the system. We again consider the 
logistic example used throughout this chapter. While the process of moving troops, 
supplies, and equipment from the United States to a foreign shore is a highly sched- 
uled, well planned operation, there is unavoidable randomness in the system. As an 
example, we consider the effect of mechanical failure on air transport flow. Data, such 
as the time between failures, can be gathered for the relevant aircraft. This data can 
then be analyzed statistically to determine the mean and variance, and a distribution 
fitted to the failure rates. Using this information, the failure can be simulated so 

^Random variates are explained in Section D. 
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as to occur randomly according to a distribution that has been fit to the observed 
data. The simulation, then, is capable of demonstrating the effect of a decrease in the 
movement rate of aircraft into the area of operations. Further simulation work may 
include modeling how the logistician or commander adapts to the lost air transport 
movement capability and implements an updated flow plan. 

1. Random Versus Pseudo-random Numbers 

Knuth provides a good definition of the term random. 

[The idea of randomness often invokes] philosophical discussions about what 
the word “random” means. In a sense, there is no such thing as a random 
number; for example, is 2 a random number? Rather, we speak of a sequence 
of random numbers with a specified distribution, and this means loosely that 
each number was obtained merely by chance, having nothing to do with other 
numbers of the sequence, and that each number has a specified probability of 
falling in any given range of numbers. [Ref. 15, page 2] 

After computers were introduced, people began looking for efficieht ways to 
obtain random numbers using computer programs. Several methods were investigated, 
but none proved efficient nor simple enough to gain acceptance. These problems led 
to an interest in the production of random numbers using the arithmetic operations 
of computers. John von Neumann suggested the “middle-square” method in 1946. 
The idea is to take a number chosen at random, square that number, then extract the 
middle digits to produce the next random number. The problem with this method is 
that there really is not any randomness in the process. Each number is completely 
determined by the one before it. However, the sequence of numbers appears to be 
random. I'lie generation of sequences of random numbers deterministically is usually 
called ps( udo-raudom number generation. Within most textbooks, as well as in this 
thesis, M“<|uen<<‘s are termed random, with the understanding that sequences only 
appear to Ik* random. [Ref. 15, page 3] 

If a random s<*(|uence of numbers is generated deterministically, that sequence 
can then be r«*produced. Is this ability to reproduce a sequence of numbers from 
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a random number generator really undesirable, though? In many cases, it is, in 
fact, desirable. There are many occasions where the precise behavior of a simulated 
stochastic process might need to be reproduced multiple times. The only way to do 
this is to reproduce the sequence of random numbers used previously. This technique 
is particularly useful during debugging, when the performance of the simulator may 
need to be consistent in order to rule out anomalous factors. [Ref. 13, page 424] 

2. Random Variates and Distribution Characteristics 

A random variate is a random observation generated from a probability distri- 
bution [Ref. 13, pages 11, 462]. A probability distribution has specific characteristics 
that are referred to as the first, second, and third moment. Table IV shows the para- 
meters that characterize several well known types of distributions. 



Distribution 


Parameter 1 


Parameter 2 


Parameter 3 


Gaussian 


Mean 


Variance 


NA 


Exponential 


Mean 


NA 


NA 


Uniform 


Smallest Limit 


Largest Limit 


NA 


WeibullGamma 


Shape Parameter 


Scale Parameter 


NA 


Lognormal 


Scale Parameter 


Shape Parameter^ 


NA 



Table IV. Parameters of Various Distribution Functions. 



We will use the Gaussian (Normal) distribution as an example in this section. 
Figure 11 shows a histogram of a Gaussian distribution of 100,000 random variates 
distributed around a mean of 100 with standard deviation 15. Random variates can 
be thought of as the x-axis values. The frequency of x-axis values is plotted along the 
y-axis. The Gaussian curve shows us that there are more random v'ariates near the 
mean, and fewer as you move away from the mean. An explanation of how random 
variates can be generated from this information can be found in Section 3. 
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Gaussian, mean 100, stnd dev 15 




Figure 11. An Example of a Gaussian Distribution, mean of 100, standard deviation 
of 15. 



3. Generating Random Variates 

First, we present a short summary of what we have discussed thus far. A 
stochastic process is a process that contains some probabilistic components. In order 
to accurately simulate a stochastic process, those aspects of the process that occur 
randomly must retain their random nature in the simulation. In order to simulate a 
stochastic process, then, specific information about the nature of the random factors 
must be known. 

For example, we again consider the fork lift. VVe assume that the rate at which 
the wrong cargo (in error) is loaded on an aircraft is a random parameter that must 
be considered in a simulation of the fork lift. Experimental data may show that the 
mean time between a loading error per fork lift is 100 hours, where the data from 
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which this information was gathered behaves as a Gaussian (Normal) distribution 
with mean 100 and standard deviation 15. This example was used to produce Figure 
11. Given this information, the simulation of the fork lift can incorporate a random 
error corresponding to this known behavior. Instead of a constant value of 100 hours 
for the mean time between a loading error, a factor can be added to the simulation 
that causes the fork lift to load the wrong cargo randomly, but at time differences 
generated from a Gaussian distribution of mean 100 and standard deviation 15. 

The mechanics of generating random variates are specific to the distribution in 
question; however, every method relies upon a source of independent and identically 
distributed (IID) random variates uniformly distributed on the interval (0, 1) [Ref. 13, 
pages 462-463]. These are commonly called IID U(0,1) random variates. The most 
important aspect of generating random variates, then, is a valid source of IID U(0,1) 
random variates. While there are numerous random number generators available for 
particular languages and operating systems, the user must ensure that the random 
number generator they choose to use is in fact IID U(0,1). 

There are several general classes of approaches for generating random variates 
from an IID f/(0, 1) generator. 

• Inverse Transform. This method is best used for generating random variates 
with a distribution function F that is continuous and increasing when 0 < 
Fix) < 1. The technique is to generate U ~ f/(0, 1) and return random 
variate .V = F~^{U). [Ref. 13, pages 465-474] 

• Composition. This technique applies when the distribution function can be 
best expressed as a combination of other distribution functions. When the dis- 
tribution function F can be expressed as a convex combination of distribution 

functions F\,F 2 it may be easier to gather sample random variates 

from the F/s than from the original F. [Ref. 13, pages 474-47.5] 

• Convolution. The term convolution “comes from the terminology in stochastic 
processes where the distribution of .V is called the m-Jold couvolution of the 
distribution of Vj.” [Ref. 13, page 477] This technique is best suited for distri- 
butions for which the generation of random variable -V is more easily expressed 
as a sum of several IID random variables. The implementation of this technique 
involves the generation of VI, V 2 , . . . , V)t, IID, each with distribution function 
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F, and the subsequent return of random variate X = Yi Y 2 + . . . + Y^. [Ref. 
13, page 477-478] 

• Acceptance-Rejection. This is a less direct approach than the aforemen- 
tioned techniques, yet is still useful, particularly when a more direct method 
is too difficult or costly. This method requires the specification of a function 
t that majorizes^ the density function /. This technique involves generating a 
Y that has density r, and generating a U ~ t/(0, 1), that is independent of Y. 
If U < this method must return the random variate X = Y, otherwise, it 
generates a new value and similarly tests it. [Ref. 13, page 478] 

The method used to generate a random variate should be chosen based upon 
the particular distribution the random variate is to be drawn from, and the ease and 
reliability with which random variates can be generated for that distribution. The 
generation of random variates is considered reliable if the occurrence of individual 
random variates is statistically equivalent to the distribution from which they are 
derived. [Ref. 13, page 463] 

If the distribution is of a known type, implementations are readily available that 
require little work and promise the accurate generation of random variates. Otherwise, 
the easiest method to implement is most likely Inverse Transform. Inverse Transform 
can be an easy method because random variates are generated from the inverse of 
the distribution function F; inverting the distribution function may be a simple task. 
However, for some distributions, the inverse may be undefined. For example, the 
Gaussian distribution function cannot be inverted because it does not have a closed 
form expression [Ref. 13, pages 465 — 466]. While there are numerical methods to 
evaluate F~^ when there is no closed form, such an Inverse Transform may not be 
the most computationally efficient method to use. If the distribution in question is 
multi-modal, or a combination of two or more different distributions, random variate 
generation becomes more difficult, and Composition or Convolution should be used. 



^Majorizes: <(i) > /(x). 
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a. Generating Gaussian Random Variates 
The Gaussian distribution is characterized by the first moment (mean) 
and second moment (variance). A random variate X ~ N(0, 1) can be used to obtain 
some X' ~ N{/j,,cr'^) by setting X' = fx + crX. The ability to generate this data from 
the first and second moments is helpful, because it allows us to focus on obtaining 
standard Gaussian random variates (A^(0, 1)). Random variates particular to any 
Gaussian distribution can be obtained using the above computation. [Ref. 13, pages 
490-491] 

There are two commonly used methods for obtaining standard A^(0, 1) 
random variates. The first is the Box and Muller method, which is effective but has 
a limitation when used with linear congruential random number generators(LCGs). 
(LCGs are explained below.) We now explain the Box and Muller method, and then 
explain this limitation. The Box and Muller method begins by generating two random 
variates, Ui and U 21 from an IID (7(0,1) generator. The variables X\ and X 2 are 
generated using the following formulae. 

X\ = \/— 2 In U\ cos 2ttU2 

X 2 = y/—2 In U\ sin 27r(/2 

A'l and X 2 are then IID A^(0, 1) random variates. The limitation alluded to above 
can be easily seen when U\ and U 2 are not true IID (7(0, 1) random variables, but are 
dependent, which might can occur if Ui and U 2 are generated using the same seed. 
Linear congruential generators rely on recursion to generate numbers. The recursive 
formula for a linear congruential generator is as follows. 

Z, = (aZi_i + c){mod m) 

In this formula, m is the modulus, a is the multiplier, c is the increment, and Zq 
the starting value or seed[Ref. 13, page 425]. The problem occurs because U 2 is a 
function of (7j as shown in the recursive relation above. This dependency can cause 
A'l and X 2 to fall on a spiral in (A'i,A' 2 ) space, because they are not independent. 
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identically distributed, random variates. Because of the possibility of this kind of 
restrictive dependency, the Box and Muller method should not be used when only a 
single stream of a linear congruential generator is available, but can be used if two 
t/(0, 1) random variables from separate seeds are available. [Ref. 13, page 425, 491] 

A second method for obtaining standard A^(0, 1) random variates is 
known as the polar method. This method is suitable for use with a single linear 
congruential generator seed. N{0, 1) random variates are generated using the following 
algorithm [Ref. 13, pages 491-492]. 

1. Generate Ui and U 2 as IID [/(0, 1) variables. 

2. Let Vi = 2Ui — 1 for z = 1,2. 

3. Let W = Vi^ + V^. 

4. If ly > 1, go back to step 1. 

5. If ly < 1,^ 

let r = 

let = yjT 

let a^ 2 = y2y. 

6. A'l and X 2 are IID A(0, 1) random variates. 

6. Generating Exponential Random Variates 
The other distribution needed for our SmartNet simulator was the expo- 
nential distribution. The exponential distribution is characterized by the first moment, 
sometimes called the mean or simply 0. While the polar method is best suited for 
generating Gaussian random variates, the inverse-transform method proves to be the 
simplest and nu>st accurate method for generating exponential variates. It is suitable 
becaus<- Inuh th<- exponential distribution function and its inverse can be expressed 
using clo-e<l form <*<pialions. An exponential random variate A' can be generated using 
the following simple algorithm [Ref. 13, page 486]. 

1. G«*nerat«‘ I as an IID 1^(0, 1) variable. 

2. Let A = -J\nU. 
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3. Return X, 



Figure 12 shows an exponential distribution with mean 100. 
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Figure 12. An Example of an Exponential Distribution, mean of 100. 

E. CONCLUDING REMARKS 

This chapter has explained simulation in general, discrete event simulation in 
particular, and described in detail the generation of random variates for use in discrete 
event simulations. The next chapter will explain how discrete event simulation and 
random variate generation have been added directly to SmartNet [Ref. 1. 2, 3, 4]. 



Exponential Distribution, mean = 100 
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IV. 



THE SMARTNET SIMULATOR 



A. INTRODUCTION 

This chapter explains changes and enhancements made to the original Smart- 
Net simulator^ [Ref. 16]. The use of Discrete Event Simulation in the SmartNet 
simulator is discussed in Section C. Section D describes how we went about alleviat- 
ing the limitations of the original SmartNet simulator. 

B. BACKGROUND INFORMATION 

As we saw in Chapter II, SmartNet is a very capable scheduling framework 
with numerous and powerful operational modes. One of those modes is the SmartNet 
simulator mode. The simulator itself has powerful features that make it a useful tool; 
however, it also possesses certain limitations^. 



C. DISCRETE EVENT SIMULATION AND THE SMART- 
NET SIMULATOR 

The SmartNet simulator permits the operation of all aspects of SmartNet to be 
simulated using discrete event simulation. As we saw in Chapter 111, when performing 
discrete event simulation, we need to identify events that trigger both the advancement 
o[ .simulated time and the collection of system state variable data. Two of the events 
riirrently tracked by the SmartNet simulator are: 

1. Job Start: This event occurs when a job is started (the actual execution of 
the job is simulated when SmartNet is run in simulator mode) on a machine 
iti accordance with the schedule created by SmartNet. 

■J. Job End: This event occurs when job execution completes. 



' I li»* explanation of SmartNet provided in Chapter 11 provides more detailed definitions of many 
term> found in this chapter. 

^.S«-veral of these limitations have been corrected via this research. Tlio.se changes are discussed 
within this chapter and in Appendices B and C. 
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These are two of the the most important events to SmartNet’s run-time performance 
because they are the crucial components of the execution of the schedule that SmartNet 
creates^. These two events bracket a job’s run-time, a duration that can take anywhere 
from micro-seconds to several days, depending upon the job and the machine. As a 
job begins, the time of its Job Start event is recorded and reported. When that same 
job completes (a Job End event), the run-time duration of that job is reported, and 
the simulation clock advanced to that point. In SmartNet simulation mode, the job 
does not actually execute, but a simulated run-time is used instead. The result is the 
ability to simulate the execution of a schedule that might take several days to run if 
the jobs were allowed to actually execute, but which takes several minutes instead. 
Figure 13 is an example demonstrating both the strength of discrete event simulation 
in SmartNet and illustrating event occurrences. 

Unfortunately, we do not know what the exact run-time duration of a particular 
job on a particular machine would be. When SmartNet is actually running, start and 
finish times of jobs reflect actual wall clock time"*. In this case, run-times are real. 
However, because the simulator does not actually execute jobs, an estimate of the 
actual run-time duration is needed. 

1. Advantages of the SmartNet Simulator 

Using the SmartNet simulator provides definite advantages, both from the as- 
pect of experimental capabilities and from the aspect of design. We already mentioned 
its capability to simulate the execution of complex schedules in several minutes that 
would, in reality, require days to complete. This capability gives SmartNet research- 
ers the opportunity to compare the performance of different scheduling algorithms. 
Furthermore, there are design advantages because the simulator mode is built directly 

^While the creation of a near-optimal schedule is the true benefit gained from using SmartNet, it 
is not an event on which we concentrate in our simulation experiments. 

"*Wall clock time is time as we perceive it throughout our day-to-day activities. It is the time we 
keep on the clocks in our home. 



48 




Figure 13. Real Time versus Simulated Time in the SmartNet Simulator. Three jobs, 
scheduled on one machine. The figure depicts simulated time advancement, real time, 
and event occurrences. 

into SmartNet, helping the researcher to place a greater degree of confidence upon 
their research results. When using the SmartNet simulator, we are' actually running 
SmartNet in simulation mode. This is important for two reasons. First, the simulator 
is an integral part of SmartNet, as opposed to being a removable segment of code or 
another application altogether. This means that the schedule, scheduling algorithms, 
database, default files, and inter- and intra-process communication resulting from or 
used by SmartNet in true operational mode are al.so used by SmartNet when run in 
simulation mode. Second, any and all changes to SmartNet source code, to include 
updates, implicitly change or update the simulator. There is no need for a duplication 
of effort, with one team working on improving SmartNet and another team working 
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on improving a simulation of SmartNet [Ref. 2]. We have an economy of effort that 
results in a better simulation tool. 

2. Limitation of the Original SmartNet Simulator 

The original SmartNet simulator had one major limitation. As we have seen, 
the simulator uses Expected Time to Complete (ETC) values for each job/machine 
pair, provided in the database, to build the schedule. Schedule-building is the intended 
use of the ETC values. As a first attempt, the original simulator was built to use the 
ETC values found in the database as the simulated job run-time duration. This meant 
that simulated jobs always ran for the exact amount of time they were scheduled to 
run. In reality, even when a job is the only load on all of the resources, the non- 
determinism associated with reading from/writing to disks and memory results in 
two different run-times for the same job with the same input. It is very difficult to 
exactly predict job run-times. 

Therefore, our simulator should be able to simulate run-times of jobs according 
to run-time distribution characteristics found in various compute environments. We 
know that if a job is run repeatedly on a specific machine, it will almost never complete 
with the same duration. For example, if we run JOBl 1000 times on MACHINE-A, we 
may see 1000 different run-times. These 1000 run-time durations can be characterized 
by the distribution that they form. This distribution is specific to JOBl running on 
MACHINE-A®; JOBl running on MACHINE-A might always take at least 741.67 seconds 
to complete. The distribution of the completion times above 741.67 seconds might 
approximate an exponential distribution with mean 2.97. 

^JOBl running on MACHINE-B may have an altogether different run-time distribution. This is 
particularly true if MACHINE-B and MACHINE-A are machines with different architectures or with 
different processing capabilities. 
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D. ALLEVIATING THE SMARTNET SIMULATOR LIM- 
ITATION 

The SmartNet simulator needed to be modified so that the scheduled jobs that 
it simulates do not always execute for exactly the mean run-time. Specifically, we 
needed to alter the simulator so that run-time durations are not always identical to 
the ETC values used to create the schedule. The simulated run-time durations need to 
vary; however, they need to vary realistically. This should be done by incorporating 
run-time distribution data into the generation of simulated run-times. We have made 
these changes; they are presented in the following section. 

1. Enhancements Made to the SmartNet Simulator 

We enhanced the SmartNet simulator to allow Job run-times to be derived from 
a run-time distribution. Doing so allowed jobs to be run with durations that varied 
in a well-defined way and was not always equal to the ETC values. The ETC values 
are either the mean of historical run-time durations or user estimates. Permitting jobs 
to run for non-ETC times entailed changes to both the simulator itself as well as to 
the I/O routines that read and write the SmartNet database. We added the ability 
to specify, within the database, not only a job’s mean run-time, but also its type 
of distribution (recognizing both Gaussian and exponential distributions for reasons 
explained later) and both its second and third moments. 

Due to the modular fashion in which SmartNet is built, the number of changes 
that we had to make to the actual code, above and beyond adding our own libraries, 
were few. However, we did spend a substantial amount of time reading the SmartNet 
code, identifying and fixing bugs, and correcting its Makefiles to operate correctly at 
our site [Ref. 17]. Appendices B and C provide detailed explanations of the files that 
we altered and created. We also enumerate the changes that we made to to each file. 
In our explanations in Appendices B and C, we name the enhanced and added files 
relative to the SOLARIS directory, which is where the SmartNet source code is installed. 
We will assume that these files will be located in the SOLARIS/src/sn/program/ 
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subdirectory, unless otherwise explicitly stated. 



E. CONCLUDING REMARKS 

With our enhancements, we now have a simulator that gives more realistic 
performance than the original version. We can alter characteristics of the run-time 
distribution for any and all job-machine pairs. Further, we have the ability to add 
additional distribution types with relative ease, since the random number generators, 
distribution name, and 1**, 2"^^, and 3“^ moments are already included in the database 
during the simulation. 
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V. EXPERIMENTS 



A. INTRODUCTION 

This chapter explains the simulation experiments we performed on SmartNet 
using the SmartNet simulator. The initial goal of the simulation experiments was to 
determine whether using intelligent scheduling would be beneficial, even if the jobs 
that were scheduled did not run for exactly the amount of time that we expected. In 
I>articular, we were concerned about whether it would still be beneficial to use intelli- 
gent scheduling if one or several jobs run for a substantially different amount of time 
than expected. Because determining a perfect schedule is an NP-complete problem, 
SmartNet is a scheduling framework for heterogeneous high performance computing 
that contains many different (polynomial) scheduling heuristics [Ref. 1]. These in- 
clude several O(mn^) Greedy Algorithms an 0{mn) Fast Greedy Algorithm, an 
0(wu) Limited Best Assignment (LB A) Algorithm, an 0{mn) Opportunistic Load 
Balancing (OLB) Algorithm, and a variable complexity genetic algorithm. SmartNet 
I>ioneered the use of intelligent schedulers that accounted for both the Expected Time 
to Complete (ETC) of a job on each different machine and the expected load on each 
machine. In our simulation experiments we use the O(mn^) Greedy .Algorithms, the 
(J(nin) Fast Greedy Algorithm, the OLB Algorithm, and the LB.A .Algorithm. All 
of the algorithms, except the OLB Algorithm, use the ETC value to compute the 
schedule. The LBA Algorithm does not take into account the expected load on the 
machine.s. The primary reason for this study is because jobs rarely e.xecutc for ex- 
actly the ETC time, which in SmartNet’s case is generally the average of previous 
run times with the same compute characteristics [Ref. 5]. This diffi-rence between 
ai tual and predicted run-times often occurs because all of the compute cliaracterist- 

* If an administrator installs SmartNet so that it uses these Greedy algontlinis. SmartNet computes 
schedules for each of three different Greedy bcised algorithms and implements the on<- whose predicted 
jierformance is the best. 



53 



ics [Ref. 5] are not known or enumerated by the designer of the users program, and 
because the time to access memory and/or a disk is stochastic and not deterministic. 
In those cases where one or more of the jobs being scheduled have run-times that 
could differ substantially from the expected time, we need to determine whether there 
is still an advantage to using an algorithm that makes use of expected run-times or 
whether a computationally simpler algorithm that does not require looking up ETC 
values, such as Opportunistic Load Balancing (OLB), might not yield equivalently 
good performance. 

As we began investigating this problem, we noticed that, for different ETC 
matrices^, the performance of the various algorithms differed drastically. Therefore, in 
addition to our originally planned study, we categorized certain types of heterogeneity 
and ran experiments for many of these categories. 

We ran our experiments using the SmartNet simulator mode rather than ac- 
tually executing jobs under SmartNet. The simulator mode both gave us greater 
control over the input parameters and allowed us to complete more experiments in a 
reasonable amount of time. We begin this chapter with an explanation of the para- 
meters we varied in the experiments. These parameters include both the distributions 
and various categories of heterogeneity. In Section C, we describe the simulation 
experiments that we performed, present the data, and explain our results. Finally, 
we discuss the theoretical performance limits of the SmartNet scheduling algorithms, 
compare the performance of SmartNet’s O(mn^) Greedy Algorithm with its 0{mn) 
Fast Greedy scheduling algorithm, investigate the dependence of the performance of 
SmartNet’s various algorithms on the arrival order of job requests, and finally examine 
the performance of some of SmartNet’s algorithms when the matrix representing the 
job-machine ETC values is of mixed heterogeneity. 

^An ETC matrix represents estimated performance of all the different jobs on all the different 
available machines. A specific element of the matrix represents Expected Time to Complete of a 
specific job (row) on a specific machine (column). 
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B. PARAMETERS 



As we developed the simulation experiments performed for this thesis, we found 
a need to specify two sets of parameters per experiment: 

1. The run-time distributions used, and 

2. the category of heterogeneity involved. 

In order to determine some realistic job/machine run-time distributions that 
we would input into the SmartNet simulator for our experiments, we executed some 
programs on various parallel processors a statistically significant number of times 
and analyzed their run-time distributions. We describe these experiments in detail in 
Section 1. We expound fully on our categorization of job/machine heterogeneity in 
Section 2. 

1. Job Run-time Distributions 

In Chapter III, we explained why job-machine run-times are typically not con- 
stant, but rather vary according to some distribution. We also discussed how we 
enhanced the SmartNet simulator to generate simulated run-time durations from a 
specified distribution, thereby permitting the simulation to more accurately reflect the 
true behavior of jobs. Testing the performance of SmartNet when the run-times of 
jobs are drawn from a particular distribution is essential to this thesis; but first we 
had to determine some realistic distributions that we would use in our simulations. 
Therefore, we repeatedly executed some parallel and sequential programs, gathered 
run-time statistics, and analyzed them. 

We performed several experiments using the NAS Benchmarks [Ref. 18]. The 
NAS Benchmarks were used to determine the types of run-time distributions that may 
be typical for at least some jobs on some machines. We needed to determine sample 
parameters for these run-time distributions so that they could be reproduced by the 
SmartNet simulator. We used distributions and parameters observed during these 
NAS Benchmark tests for the run-time distributions in our simulation experiments. 
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While performing these tests, we controlled the following environmental characterist- 
ics. 

• Server location. We ran experiments where the executable and input data 
and the output generated were located on the executing machine, as well as 
experiments where all of this data was located on a shared file server. 

• Network and server load. When the executable and data were obtained from a 
file server, we ran experiments where both the network and the file server were 
both heavily and lightly loaded. 

• Uni- or Multiprocessor. We ran some experiments where the programs had 
been compiled and executed on only a single processor of our Silicon Graphics 
multiprocessor computers, and other experiments where the programs were 
compiled and executed on multiple processors of the same machines. 

• Amount of memory. We ran the jobs on two different multiprocessor Silicon 
Graphics machines. They each contained substantially different amounts of 
memory. One, caesar, had 64 MBytes and the other, elvis, had 192 Mbytes. 

• Processor speed, caesar has four 200 Mhz MIPS R4400 processors, while 
elvis has four 150 Mhz MIPS R4400 processors. 

We utilized a Silicon Graphics (SGI) Challenge-L multiprocessor machine and a SGI 
Onyx multiprocessor machine (elvis) throughout these experiments. They both ran the 
same version of the IRIX64 operating system, version 6.2. We used two machines so 
that the performance characteristics and run-time distributions of the jobs run in these 
experiments would provide us with a bigger picture of job run-time characteristics. 
Table V summarizes the configurations of the machines caesar and elvis. 

The jobs that we used throughout these experiments were from two sources: 
NAS.A’s reference implementation for some of the NAS Benchmarks, and our own im- 
plementations of other NAS Benchmarks that met the NAS Benchmark criteria. Four 
of the tests use some version of the NAS Integer Sort (IS) Benchmark, implemented 
either in parallel on four processors, or in single processor mode. Two other tests 
used the NAS Embarrassingly Parallel (EP) Benchmark run on a single processor. 
We now explain our experiments and their results. 



56 





caesar 


elvis 


Type Machine 


SGI Challenge L 


SGI Onyx 


Processor Speed 


200 MHz 


150 MHz 


Processor Type 


MIPS R4400 


MIPS R4400 


Number of Processors 


4 


4 


Amount of Memory 


64 Mbytes 


192 Mbytes 


Secondary Unified 
Instruction/Data Cache 


4 Mb 


1 Mb 



Table V. Configuration of SGI machines caesar and elvis. 



a. Integer Sort, Executed on Four Processors 
This experiment examined the run-time distribution of a version of 
the NAS Integer Sort Benchmark executed on four processors. We implemented the 
integer sort using a counting sort [Ref. 6, pages 175-178] algorithm. We used Sil- 
icon Graphic’s light weight process (thread) support functions, including mforkO, 
to implement our version of this benchmark. Below, we provide pseudo-code for the 
counting sort. 

The number of initial values to be sorted (TOTAL-KEYS), which range 
between 1 and MAX-KEY, are stored in the array key-array. The algorithm first counts 
how many of each of the different values between 1 and MAX-KEY there are, storing the 
count in the corresponding element of the array count-array. When the algorithm 
completes, f inal-cirray will contain the original values but in sorted order. 

for i = 1 to MAX.KEY count_ 2 Lrray [i] = 0 
for j « 1 to TOTAL.KEYS 

count-array [key_array[j]] = count_array[key_array [j] + 1] 
comment: count_array[i] now contains 

the number of elements equal to i 

for 1 ■ 2 to MAX-KEY 

count-array [i] = count_ 2 Lrray [i] +count .array [i - l] 
comment: count.array [i] now contains 

the number of elements less than or equal to i 
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for j = TOTAL.KEYS down to 1 

final_array[count_array [key_array[j]]] = key_array[j] 
count_axray [key_ array [j]] = count_array [key_array [j]] - 1 
comment: final_array now holds the sorted keys 

The actual code that we executed on the SGIs in shown in Appendix D. 

We ran this sort across a heavily loaded network, obtaining both the 
executable and data from a file server that was also heavily loaded. When run on 
Caesar, the run-time distribution, for 100 executions, appears Gaussian. Figure 14 
shows a histogram of this distribution. When run on elvis, the run-time distribution, 
again for 100 executions, appears exponential and is shown in Figure 15. We note that 
the truncation of the exponential distribution shown in Figure 15 occurs at approxim- 
ately 3.0. That means that the sort had to run for at least 3.0 seconds before stopping. 
The distribution that we see very closely matches an exponential distribution with a 
mean of around 0.20, translated 3.0 seconds to the right. We expect that many jobs 
would have a distribution similar to this, because all jobs have to run at least some 
amount of time^. 

In these experiments, we also see that memory size, and so, the need 
to swap to local disk, can have a definite effect upon the run-time distribution of a 
job. The integer sort on elvis completes, on average, 30% sooner than the same job 
on Caesar. We note that, in this case, the amount of memory has more influence on 
the run-time of the job than does the speed of the processor. Of primary importance, 
however, is the observation indicating that the same job, running on two different 
machines, not only has different mean run-times, but the distribution of run-times is 
different, yielding a Gaussian-like distribution on one machine and an exponential-like 
distribution on the other. 

^An exponential distribution is truncated at 0.0. If applied, without translation, in this ceise, that 
would mean there is the possibility of near-zero run-time. 
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Figure 14. Forked Counting Sort, caesar. 
b. Integer Sort, Single Processor 

This experiment is the same as that discussed in the last section, with 
the exception of being run on a single processor instead of being distributed across 
four processors. Although a slightly different C++ implementation was used, see 
Appendix D, we again based our program on the counting sort pseudo-code presented 
earlier. 

When the integer sort was run on caesar, the run-time distribution was 
not easily characterized; however, it appears related to a Gaussian distribution. The 
histogram of the distribution, shown in Figure 16, is multi-modal, which indicates 
that multiple distributions may be present. While this experiment does not provide 
us with definitive results, it does point to the fact that run-time distributions can be 
quite complex. 

When the sequential integer sort was run on elvis, the run-time dis- 
tributions were also multi-modal. Figure 17 shows a histogram of this run-time dis- 
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Parallel Counting Sort on Elvis 




Figure 15. Forked Counting Sort, elvis. 



tribution, which is also not easy to characterize. The multiple modes again suggests 
two different distributions which exist under perhaps different run-time specific con- 
ditions. We suspect that these conditions are related to changes in the network and 
server loads. 

Once again, this set of experiments showed us that additional memory 
can greatly enhance run-time performance. The tests on elvis ran 700% faster than 
those tests run on caesar, which has the faster processors. The tests also show that 
run-time distributions can be very complex, and may be difficult to reproduce in a 
simulation. .Although this thesis’ experiments did not use such complex distributions, 
they should be modeled in future work. 

c. Embarrassingly Parallel NAS Benchmark 
'I h<- ne.xt set of experiments that we describe compared the run-time 
distribution', of <om|>ute intensive jobs run from local disk to those run across the 
network from a file server. The tests that we describe in this section were executed 
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Counting Sort on Caesar, Single Processor 




Figure 16. Counting Sort, caesar, single processor. 



only on caesar because elvis did not have sufficiently large local disk available. We 
used the reference implementation [Ref. 18], from NASA, of the NAS Embarrassingly 
Parallel (EP) Benchmark. This implementation uses the portable message passing 
interface(MPI) [Ref. 19] to parallelize the code. The tests we ran, however, were 
compiled to be executed on a single processor‘s. The EP Benchmark was run 100 
times for each test. 

Figure 18 shows the run-time distribution of the EP Benchmark run 
100 times when the executable is stored on caesar’s local disk. This distribution 
appears exponential. We see the same effect here as we saw in the integer sort run on 
four processors^. There is a shift of 741 seconds to the right, after which we see an 
exponential distribution with mean 2.72. 

‘'The MPI mechanism is still utilized in the EP Benchmark when it is compiled for a single 
processor. 

®The number of samples at the far left end of the distribution are small enough when compared 
to the total number of samples to be considered a statistical fluke. The data point is included for 
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Counting Sort on Elvis, Single Processor 




Figure 17. Counting Sort, elvis, single processor. 



We also examined the run-time distribution of the same EP Benchmark 
code when executed on caesar but obtained across a lightly loaded network from a 
lightly loaded file server. Figure 19 shows the histogram from 100 EP Benchmark 
run-times. The run-time distribution appears to be truncated Gaussian®. Like the 
experiment above where the EP Benchmark was stored on local disk, the truncation 
value reflects the minimum time that it takes to run this EP Benchmark when the 
executable must be obtained from our particular file server over our local network. 
That truncation appears again at 741 seconds. The difference here, though, is that 
there is a different distribution of run-times throughout the range of values. We 
attribute this to the influence of other loads on the network and file server on the total 
compute time for reach job. 

completeness. 

^In this thesis, we sometimes use the term “truncated Gaussian” to refer to what is technically an 
Erlang or Gamma distribution. Both Erlang and Gamma distributions arc strongly related to both 
Gaussian and exponential distributions. 
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Figure 18. epAl NAS Benchmark, Executable Residing on Local Disk. 

2. Categories of Heterogeneity 

The other parameter that we need to examine and that we describe in this sec- 
tion concerns the category of heterogeneity we use in our experiments. We quantify 
the categories of heterogeneity according to two axes, one axis representing the job 
heterogeneity and the other axis representing machine heterogeneity. A heterogeneous 
computing environment is commonly thought of as a network of machines of differing 
or similar architectures, often having, at the very least, differing performance charac- 
teristics such as processor speed, quantities of cache, and amount of main memory. 
For example, two machines may be able to execute the same job, but one machine may 
execute that job an order of magnitude faster than the other machine. If the machines 
are nearly identical, then there is very little heterogeneity amongst the machines. If the 
machines are vastly different, then the collection of machines is very heterogeneous. 
Our categorization of heterogeneity encompasses this common-sense concept, but is 
more general in scope and more technically rigorous in its definition. 



epAl NAS Benchmark on Caesar 
1 1 1 



"epAl-caesar.dat“ •<>- 



100 Samples 

Code on Machine; no network involved 
Mean: 743.72 
Sigma: 1.57 



63 



epAl NAS Benchmark on Caesar 




Figure 19. epAl NAS Benchmark, Files obtained over a lightly loaded network. 

However, both machines and jobs must be considered in any good charac- 
terization of computational heterogeneity. Jobs, like machines can be either very 
heterogeneous, slightly heterogeneous (e.g., one instantiation of a C-f-f compiler and 
another instantiation of the same C-|-+ compiler executing with a higher specified level 
of optimization) or homogeneous (as we might expect to execute on special-purpose 
hardware). As an example, we consider a collection of jobs that is to be scheduled. If 
all the jobs are identical, e.g., all compiling the same source code and using the same 
specified run-time parameters, there is no heterogeneity amongst the jobs. If the jobs 
are all vastly different, then the jobs are very heterogeneous. 

Therefore, we use two axes, one representing the heterogeneity of jobs and the 
other representing the heterogeneity of machines, to categorize the heterogeneity of a 
computing system. The relationship of job and machine heterogeneity is depicted in 
Figure 20, part (a). 

We know that SmartNet uses estimates of the run-times of its different jobs 
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(a) (b) 

!• igurt* 20. Quadrants of Heterogeneity and Categories of Consistency. Part (a) shows 
tlic two dimensional relationship of heterogeneity between jobs and machines. Part 
(b) shows the third dimension, consistency, and the numerous planes of consistency 
that can exist in different scenarios. 



t»n its different machines to build a schedule detailing what jobs should run on which 
nia< hine. For our simulation experiments, heterogeneity is introduce<l through setting 
appropriate parameters in the SmartNet database (See Appendix A). Specifically, 
heterogeneity of both jobs and machines is introduced into Smart .Net through ap- 
propriately setting the ETC values of each job-machine combination present in the 
dat<ih;i.se. The actual database is quite complex, containing internet addr<-sses of ma- 
< liiiies and (optionally) the longitudinal and latitudinal coordinates of those machines 
,'\s such, we will represent its heterogeneity information in a more easily understood 
matrix format. An example of such a matrix is shown in Table \H. 
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Machine 


Job 


1 


2 


3 


4 


5 


1 


mean 


30034 


11 


239 


30097 


533 


2 


mean 


25 


1003 


8619 


75 


65037 


3 


mean 


1078 


93 


1950 


204001 


8081 


4 


mean 


35096 


9501 


29 


2582 


1000 


5 


mean 


63 


45055 


1074075 


11533 


15 




Machine 


Job 


6 


7 


8 


9 


10 


1 


mean 


69 


42799 


1396 


52453 


4652 


2 


mean 


30093 


4723 


11372 


16333 


287 


3 


mean 


233 


9 


193 


566 


63526 


4 


mean 


75019 


23333 


782 


1134 


1705 


5 


mean 


403 


207 


6374 


304291 


666 



Table VI. High-Job, High-Machine Heterogeneity Matrix. 



For Table VI, we note that the average variance^ for both the rows and the 
columns is very large, on the order of 10^°. Furthermore, we note that the distribution 
of both the column and row variances is unimodal. These facts indicate that the 
average job-machine run-times shown in this table fall at a point whose coordinates 
correspond to both High-Job Heterogeneity and High-Machine Heterogeneity (See Hi- 
Hi in Figure 20). In contrast, a matrix where the' average variance for both the rows 
and the columns might be on the order of 10, would correspond to both lower machine 
and lower job heterogeneity (See Lo-Lo in Figure 20). 

Our simulation experiments were built to examine four combinations of het- 
erogeneity. It requires approximately 72 hours, not including setup time, to run 
a complete simulation experiment® and approximately six hours to run a Baseline 

‘The variances referred to here are variances of the run-time values in the ETC matrices. 

®A complete simulation experiment requires that SmartNet build and execute 15 schedules for 
each database and the four different command files. 
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experiment®. We first chose to examine matrices representing four extreme values in 
our coordinate system. These four combinations can be thought of as quadrants of 
heterogeneity. 

• High-Job, High-Machine Heterogeneity (Hi-Hi). All jobs perform very differ- 
ently on all machines. As noted above, the variances for our complete matrix 
in Table VI, of both jobs and machines, are on the order of 10^®. 

• High-Job, Low-Machine Heterogeneity (Hi-Lo). Each individual job performs 
similarly on all machines; however, no jobs perform similarly. For our sample 
matrix in Appendix E, the variance of jobs is on the order of 10^, while the 
variance of machines is on the order of 10®. 

• Low-Job, High-Machine Heterogeneity (Lo-Hi). All jobs perform similarly on 
the same machine; however, the jobs obtain different performance on different 
machines. For our sample matrix in Appendix E, the variance the variance of 
jobs is on the order of 10®, while the variance of machines is on the order of 
lOT 

• Low-Job, Low-Machine Heterogeneity (Lo-Lo). All jobs perform similarly on 
every machine. For our sample matrix in Appendix E, the variance of both 
jobs and machines is on the order of 10®. 

There is a third dimension in the relationship between job and machine het- 
erogeneity, however, which we call consistency. Consistency refers to the performance 
similarities of all jobs across machines. If all jobs perform best on the same machines 
(and subsequently perform worse on the same machines) then the schedule being ex- 
ecuted is very consistent. We expect this situation to be common in some engineering 
laboratories where initially all machines might be workstations bought from the same 
manufacturer, with the same amount of memory and types of processor(s). As time 
goes on, machines are upgraded. A processor is added. Memory is added. But, typic- 
ally, the machine with the fastest processor would also contain the most memory and 
the most cache. For now, we view this as adding a discrete axis to our already existing 



®A Baseline experiment consists of SmartNet building and executing one schedule for a single 
database and each of the four different command files. 
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two axes of heterogeneity, one which represents just two, 2-dimensional planes: con- 
sistent and inconsistent. Future work is needed to determine how we might quantify 
this dimension as a continuous axis. Figure 21 shows the existence of consistency 
between two jobs and four machines. Conversely, if jobs perform well on different 
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Figure 21. Consistency between two jobs and four machines. Both jobs perform better 
on the same machines. 



machines, and poorly on different machines, then the schedule being executed is in- 
consistent. Figure 22 shows inconsistency between two jobs and four machines. We 
(h'pict consistency, the third dimension of heterogeneity, in Figure 20, part (b). 

I’o be brief, our nomenclature only includes mention of consistency if the mat- 
rix we arc dealing with is consistent. In other words, when the term “High-Job, 
High-.Machine Heterogeneity’' is used, the matrix we are using is inconsistent. If 
tin- term “High-Job, High-Machine, Consistent Heterogeneity" is use«i. that matrix is 
consistent. 



68 



TIME 

FOR 

EXECUTION 




Figure 22. Inconsistency between two jobs and four machines. The jobs perform 
differently on the different machines; there is no consistency of performance. 



C. SIMULATION EXPERIMENTS 

We performed two simulation experiments on SmartNet, aimed at examining 
how well the scheduling algorithms performed when the jobs scheduled did not execute 
for exactly the mean (of the previous run-times) specified in the SmartNet database. 
We first ran Baseline experiments that compared the performance of SmartNet’s vari- 
ous algorithms for the different categories of heterogeneity, without considering con- 
sistency. Following that, we identified the Baseline matrices for which the 0{mn^) 
Greedy Algorithm out-performed both the Opportunistic Load Balancing (OLB) Al- 
gorithm and the Limited Best Assignment (LBA) Algorithm. We term the matrices 
in this class to be significant matrices. We then ran experiments for consistent 
matrices that corresponded to the significant matrices, that is, we ran additional 
Baseline experiments using matrices that were identical to the significant matrices. 
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except that the contents of each row was sorted, from smallest to largest*®. We term 
the sorted version of these matrices as consistent significant matrices. Finally, for all 
significant matrices, both consistent and inconsistent, we ran additional simulation 
experiments where the jobs did not execute for exactly the mean of the previous run- 
times; however, in one case the run-time distribution was assumed to be Gaussian, 
and for another case, it was assumed to be exponential. The details of the experiments 
are discussed in the following subsections. 

Although the database (matrix) values for the experiments differed greatly, the 
conduct of the experiments was similar throughout. We now describe the features that 
were common to all of the experiments. 

• Database Format. Although the job/machine heterogeneity differed for all 
databases created, each database contained mean run-times for each of five 
different jobs on each of ten different machines. 

• Data Collection. Except for the Baseline experiments, all experiments in which 
the actual run-time of a job could differ from the predicted run-time of that 
job were executed 15 times. In each run, a different value was used to seed the 
random number generator that was used to generate the simulated “actual” run- 
time duration. The total time required to execute each schedule was summed 
and the average was computed. Multiple seeds were used to ensure that our 
results were not skewed**. We only ran the Baseline experiments one time, 
as the execution of this schedule was always the same (because jobs ran for 
exactly the predicted run-times). 

• Scheduling Algorithms. We examined the performance of four scheduling al- 
gorithms, which are built into SmartNet, during each simulation experiment. 
These algorithms were explained in Chapter II and are listed below. 

- Opportunistic Load Balancing (OLB) 

— Limited Best Assignment (LBA) 

— Greedy, an O(mn^) algorithm 



*°We note that the average variance of each column is reducerl by this sorting, but, as an e.xample, 
for our High-Job, High-Machine Heterogeneity matrices, even the consistent matrices had an average 
column variance on the order of 10* 

’ 'This is a common method to reduce the influence of a single random number generation sequence 
that may be biased. 
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- Fast Greedy, an 0{mn) algorithm 

Both the Greedy and Fast Greedy scheduling algorithms were pioneered by 
SmartNet. The LBA Algorithm is also contained in HeNCE [Ref. 20] and the 
OLB Algorithm is the only algorithm available in most resource management 
systems such as Condor, LoadLeveler, and NQE. SmartNet contains all of these 
algorithms, which are of different complexity, because SmartNet is a scheduling 
framework and different algorithms are appropriate for different environments. 
(See also previous work done by Benton and Lemanski on scheduling of network 
broadcasts [Ref. 9].) 

• Job Request Format. When SmartNet is run in simulation mode, jobs are 
requested via a command file. The jobs can be requested either in groups or 
sequentially. For example, if we want to request job4 to be executed three 
times, and job 5 to be executed 15 times, a grouped request would ask for 
job4 to be run three times, and for job5 to be run 15 times. To accomplish 
the same thing when jobs are submitted sequentially, we might request single 
executions of two different jobs in the order job5, job4, job5, job4, job4, 
and then 13 more single requests of job5. We looked at SmartNet scheduling 
algorithm performance when jobs were requested to be run in group format 
and randomly sequential format; however, the majority of our experiments were 
generated using randomized sequential requests. This was done because the 
order of job request affects the schedule. The Fast Greedy Algorithm maps and 
schedules the jobs on machines in the order in which they are submitted. The 
Greedy Algorithm uses the order to break ties. We chose to execute mostly 
singular requests both because they more closely mimic a real environment 
where different jobs are submitted by different users and because we wished to 
examine whether these algorithms performed better or worse when sequential, 
as opposed to grouped, requests were submitted. 

• Job Request Sets. In order to ensure different results from the grouped method, 
we generated two random sequences of 125 job requests, which we will call 
125-1 and 125-2, where each individual request was chosen according to a 
uniform random distribution from among five different jobs. We also generated 
two more random sets, this time of 500 job requests, calling them 500-3 and 
500-4. We did this to look at performance variations between job request 
orderings, as well as to examine any performance differences that might occur 
because fewer or more jobs were requested. 

• Actual Run-time Distributions. When we generated run-times that were differ- 
ent from the mean predicted run-times, we ran experiments for both Gaussian 
and exponential distributions. 

Based upon our experiments with the NAS IS and EP Benchmarks above, we 
chose to implement a translated distribution with mean of 3.0 in our subsequent 
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simulation experiments. That is, we added the expected time to compute for 
a given job/machine pair, less the amount needed to keep the mean from 
changing, to a value drawn from an exponential distribution with mean of 
3.0*^. That is, the simulated run-times were generated using code represented 
by the following pseudo-code. 

— X is the ETC of the job, available from the SmartNet database. 

— Y = X — 3.0 (The 3.0 is taken from the experiments discussed previously.) 

— Z is the random variate generated from an exponential distribution with 
mean 3.0. 

— If ETC > 3.0, Run-time.duration = Y + Z. 

— If ETC < 3.0, Run-time-duration = Z. 

— Return Run-time-duration. 

The actual code for this function is contained in Appendix C. 

Again, based upon our earlier experiments described in Section 1, we chose to 
implement a truncated Gaussian distribution in our simulation experiments. 
We chose to truncate left of the mean at the mean less one sigma. Below is 
the pseudo-code for the algorithm we used to obtain a random variate from a 
truncated Gaussian distribution for run-time duration. 

— a = \/2nd-moment. 

— while Run-time-duration > Ist-moment — cr 

* Generate random variate X from Gaussian distribution. 

* Run-time-duration = X 

— Return Run-time-duration. 

The pseudo-code describes code used in the function generate_normal(), 
which can be found in Appendix C. 

1. Baseline Experiments 

Th<‘>«‘ experiments were used to record SmartNet’s performance when each 
job ex<‘cul**<l f»»r exactly the amount of time for which it was scheduled. The Baseline 
experiment reMilt.s show that there are circumstances where the Greedy and Fast 
Greedy AlRi»rithms perform comparable to either OLB or LBA. Complete results 



'^In lai»*r rxiwrimenls, wc will also permit the mean for the exponential distribution to depend 
upon thf job/in.vhine pair. 



72 



from all of the Baseline experiments can be found in Appendix F. In this section, 
we provide graphical interpretations of typical SmartNet performance for a subset of 
the experiments. We note that if the run-time of an algorithm is not included in a 
graph below, it performed at least an order of magnitude worse than the included 
algorithms, and was omitted so that we could more readily distinguish between the 
remaining algorithms. 

• High-Job, High-Machine Heterogeneity. See Figure 23. For the High-Job, 
High-Machine Heterogeneity matrix that we presented in Table VI, we see 
that Greedy and Fast Greedy perform comparable to LB A. Since LB A is a 
slightly less compute intensive scheduling algorithm, it may make sense to use 
the LBA scheduling algorithm instead of Greedy or Fast Greedy in such cases. 
The figure also shows how poorly the OLB Algorithm performs compared to 
the other three. 
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Figure 23. 

• High-Job, Low-Machine Heterogeneity. See Figure 24. For our matrix chosen 
from the High-Job, Low-Machine Heterogeneity extreme, we saw that OLB 
performed just about as well as the Greedy and Fast Greedy Algorithms. OLB 
is also a computationally simpler scheduling algorithm. In this case, then, it 
may make sense to use the OLB scheduling algorithm instead of the Greedy 
or Fast Greedy Algorithms. 
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Figure 24. 

Low-Job, High-Machine Heterogeneity. See Figure 25. For our matrix chosen 
from the Low-Job, High-Machine Heterogeneity extreme, we saw that both the 
Greedy and the Fast Greedy Algorithms perform much better than OLB or 
LBA. 
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Figure 25. 
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• Low-Job, Low-Machine Heterogeneity. See Figure 26. For this matrix, both 
the Greedy and Fast Greedy Algorithms again perform comparable to OLB. 
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Figure 26. 



We recall that consistency is the third dimension in the relationship between job 
and machine heterogeneity. We chose to examine two categories of heterogeneity along 
the consistency axis; High-Job, High-Machine, Consistent Heterogeneity, and Low- 
Job, High-Machine, Consistent Heterogeneity. These two categories are among some 
of the computing environments likely to be found today. When organizations purchase 
computers, they usually buy many similar machines. These machines get upgraded 
or replaced as money becomes available or as equipment breaks. Occasionally, more 
expensive, specialized computers are purchased in small numbers. These are added 
to the environment. This typically results in consistent behavior amongst machines 
— that is, there will be some machines that all jobs run well on, and some machines 
that all jobs run slower on. The results of the Baseline experiments implied that the 
most interesting run-time behavior would be found in the above two categories. We 
recognize that the other categories merit investigation, but arc outside the scope of 
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this present thesis. We note that for both of these categories, the variance of the jobs 
and machines remains similar to that found in their inconsistent counterparts. 

• High-Job, High-Machine, Consistent. See Figure 27. For our matrix chosen 
from this category of heterogeneity. Greedy and Fast Greedy perform better 
than either the OLB or LB A Algorithms. 



Baseline Hi-Hi-Consistent Results 

125-1 

2500 

•o 2000 

o 
o 

S 1500 

c 

CD 

E 1000 

c 

£ 500 

0 

Iba greedy fast greedy 




Figure 27. 

• Low-Job, High-Machine, Consistent. See Figure 28. Again, for our matrix 
chosen from this category of heterogeneity, Greedy and Fast Greedy perform 
better than either the OLB or LB A Algorithms. 

To briefly summarize the experiments we described above, we see, then, that 
from these six matrices, chosen from categories that represent the extreme ends of 
heterogeneity, the Greedy and Fa^t Greedy Algorithms develop schedules that are 
worthy of the extra compute time they required in three cases. Based upon these 
results, we opted to only further evaluate the Low-Job, High-Machine; High-Job, 
High-Machine, Consistent; and Low-Job, High-Machine, Consistent matrices in the 
remaining tests. 
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Figure 28. 

2. Simulation Experiments where Jobs Ran for Times 
Different from the Predicted Run-times 

This set of experiments examined the performance of the SmartNet scheduling 
algorithms when job run-times differed from the ETC values that were used to develop 
the schedule. For these tests, we used the enhancements that we made to the SmartNet 
simulator, described in Chapter IV. Using these enhancements, we were able to input 
the type of run-time distributions that the jobs being scheduled would have. Using 
the experiments described in Section B of this chapter, we determined the specific 
parameters needed to instantiate the distributions we might find in typical compute 
intensive jobs. We simulated jobs with both exponential and truncated Gaussian 
run-time distributions. 

a. Exponential Distribution Experiments 
The results of these experiments compare the performance of the various 
SmartNet scheduling algorithms when all jobs have an exponential run-time distri- 
bution. We recall from Section B of this chapter that the sample run-times from 
those experiments closely fit a shifted exponential distribution with mean 3.0. The 
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individual results from the exponential simulation experiments, which are consistent 
with the conclusions that we make in this section, can be found in Appendix F in 
Table XIX. 



Exponential Lo-Hi Results 

500-4 



in 

-o 

c 

o 




o 


CO 


CD 


“O 


CO 


c 


c: 


(0 




CO 


CD 


Z5 


E 


O 

sz 


1 

c 

Z3 

DC 


h- 




Iba fast greedy 

greedy 



Figure 29. 
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Figure 31. 

When the results of these experiment are compared to the Baseline 
results, we see that jobs with exponential run-time distributions with mean 3.0 have 
completion times comparable to the Baseline results. Figures 29, 30, and 31 show 
these comparisons for the matrices we used in our simulations. These figures show 
that the schedules built by the SmartNet scheduling algorithms are still effective even 
though the actual run-time of a given job on a given machine can differ greatly from 
its corresponding ETC value. 

b. Truncated Gaussian Experiments 

These experiments were designed to examine the performance of the 
SmartNet scheduling algorithms when all jobs had truncated Gaussian run-time dis- 
tributions. As in the previous experiment, this test takes advantage of the enhance- 
ments made to the SmartNet simulator. While the schedule was built using ETC 
data, the simulated run-times generated by the SmartNet simulator are taken from 
a truncated Gaussian distribution. In Section B, we discussed the characteristics of 
the truncated Gaussian run-time distribution characteristics obtained from running 
the NAS EP Benchmark. We determined from those experiments that truncation oc- 
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curred left of the mean at roughly Istjmoment — \/ 2nd -moment, or the mean less a. 
Throughout this experiment, the mean, or Ist-moment, was the ETC value for the in- 
dividual job/machine pairs, and the 2nd-moment we set at 300% of the Istjmoment, 
or 3 X mean, to determine whether, if the variance was very large for all jobs, the 
Greedy and Fast Greedy Algorithms still performed much better than both the LBA 
and OLB algorithms. Any negative run-times that were generated occurred to the 
left of the truncation point, and so were not used in the experiments. The individual 
results from these experiments are included in Appendix F in Table XX. 
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Figure 32. 

The results in Figures 32, 33, and 34 show that the schedules are finish- 
ing up to 25% later than the schedules executed in the Baseline experiments. This is 
not unexpected, as truncation will shift the mean of the resulting distribution to the 
right. The results also show that the Greedy and Fast Greedy scheduling algorithms 
still |»erform better than the OLB and LBA Algorithms when job run-tiine distribu- 
tions are truncated Gaussian with very large variances. Our experiments imply that 
IS may be worthwhile to update the schedule as it is being executed to minimize the 
efftHTt of the large job variances that result from run-time distributions with very large 



80 



Truncated Gaussian 
Hi-Hi-Consistent Heterogeneity Results 
12 I 500-3 




Baseline 

T-Gaussian 



greedy 



fast greedy 



Figure 33. 



variance, in this case, with variances of 300% of the mean. This claim is justified 
because preliminary evidence indicates that the observed 25% increase in the mean 
is not fully accounted for by the effects of truncation. This may warrant reschedul- 
ing because of its relatively low cost, especially for schedules involving many more 
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Figure 34. 
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machines and many more jobs than used throughout these simulation experiments. 



D. DISCUSSION 

While performing the simulation experiments described previously in this chapter, 
we came across other aspects of SmartNet’s performance that warranted examination. 
We first examine the performance of the SmartNet scheduling algorithms when com- 
pared to theoretical bounds. We follow that with a specific comparison of the Greedy 
and Fast Greedy Algorithms throughout all the simulation experiments. We then 
compare the performance the Greedy and Fast Greedy Algorithms when the jobs 
were submitted according to a uniform random distribution with the performance of 
those algorithms when the submitted requests are sorted and grouped according to 
job. Finally, we present another matrix with High-Job, High-Machine Heterogeneity 
characteristics, but which performs differently than expected. 

1. Theoretical Limits 

SmartNet’s Greedy and Fast Greedy scheduling algorithms consider both the 
time for each job to complete on each machine, as well as the current load on each 
machine when computing a schedule. Both Greedy and Fast Greedy compute near- 
optimal schedules in polynomial time. The NP-completeness of this scheduling prob- 
lem and others, though, means that it would require exponential time to compute 
schedules that are optimal and that polynomial time schedulers can only approach this 
optimal. However, we are still interested in determining how close all the Baseline 
completion times are to the mathematical minimum. We now examine that issue for 
each of our six matrices that we enumerated above. 

Assuming that we could examine one schedule every nanosecond, it would 
require more than 10®^ years to determine, through exhaustion, which schedule would 
require the minimum amount of time to execute for one of our smallest experiments. 
For this reason, we instead use a less tight bound, though still a bound, that we now 
describe. We computed this bound, which we call the theoretical Best Case Time, 
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using the following method. 

1. From the list of jobs submitted, determine how many of each job are being 
scheduled. This results in a job count for each job. 

2. For each job, multiply the job count by the minimum amount of time that 
job could execute given that it was always assigned to its best machine, also 
assuming that no other type of job is assigned to that machine. This results in 
a min group time for each job. 

3. Sum the min group times. 

4. Divide the sum by the number of machines. The result is the Best Case Time 
for the schedule to execute. 

For each matrix, we computed the Best Case Time, and compared that time 
to the Baseline time. The comprehensive results are shown in Table XXI, located in 
Appendix F. Table XXI shows us that we get closest to theoretical Best Case Time 
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Figure .35. TlKwetical Best versus Baseline Completion Time, High-Job, Low- 
Machine Hei«Togeneity. This data depicts the percentage difference between the the- 
oretical lit . si Cast Time and the Baseline completion time. 

perforn>anre when schedules are created with our High-Job, Low-Machine and Low- 
Job, Low-Machine Heterogeneity databases. Figure 35 contains the High-Job, Low- 
Machine Heterogeneity comparison. Figure 36 contains the Low-Job, Low-Machine 
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Baseline Versus Theoretical Minimum 
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Figure 36. Theoretical Best versus Baseline Completion Time, Low-Job, Low-Machine 
Heterogeneity. This data depicts the percentage difference between the theoretical Best 
Case Time and the Baseline completion time. 

Heterogeneity comparison. All of the other matrices show at least a 100% increase in 
run-time duration over the Best Case Time. This is because the machine heterogeneity 
is low, which means that the jobs all run fairly well on all machines. Low machine 
heterogeneity gives the algorithms more good choices of machines to schedules jobs 
upon. Whenever we have high machine heterogeneity, there are fewer near optimal 
machine choices for the jobs, and some jobs have to be run on machines that they do 
not perform well on. These results seem to indicate that the theoretical Best Case 
Time can be approached if the machines being utilized are very similar. 

2. 0{mn) Fast Greedy versus O(mn^) Greedy 

While performing the simulation experiments discussed previously, we saw the 
opportunity to compare the performance of two of the scheduling algorithms pioneered 
by SmartNet. The Greedy Algorithm has a complexity of while the Fast 

Greedy Algorithm has a complexity of 0{mn). What we wanted to know is how much 
of a performance gain we see when we invest in the more complex Greedy Algorithm. 
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This investment can be considerable for very large and complex schedules, and can 
have a significant effect upon overall SmartNet time of execution. 
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Figure 37. 
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Figure 38. 

Additional results are shown in Table XXII, located in Appendix F. Figures 37, 
38, and 39 compare the performance of the Greedy to the Fast Greedy Algorithm 
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Greedy versus Fast Greedy 
Truncated Gaussian experiments 




Figure 39. 

for the the Baseline, exponential, and truncated Gaussian experiments. We averaged 
tliesc run-times across all four sets of jobs. Figure 37 shows that Greedy schedules 
complete fcister than Fast Greedy schedules for the High-Job, Low-Machine; Low-Job, 
High-Machine; Low-Job, Low-Machine; and Low-Job, High-Machine, Consistent cat- 
«“gories of heterogeneity, but that Fast Greedy schedules complete faster for High-Job, 
High-Machine; and High-Job, High-Machine, Consistent categories of heterogeneity. 
Figure 38 shows that, for our experiments, when Greedy outperforms Fast Greedy, the 
gain is never more than 15%. What this tells us is that the better schedule execution 
time gained by using the O(mn^) Greedy Algorithm may not be worth the extra com- 
putational effort. Depending upon the time required to develop a schedule with the 
(Irf-t'dy .Algorithm, it may be more economical’^ to use the Fast Gn'edy scheduling 
algorithm. This thesis does not attempt to resolve that issue, tis additional but related 
r«*search needs to be performed that examines the completion times of schedules built 
using the two algorithms under many other different categories of heterogeneity. The 
(piestion that needs to be answered is: Does a minimum of 15% decrease in schedule 

*^F>onotnical from the standpoint of compute time required to build a schedule. 
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execution time warrant the use of a O(mn^) algorithm over a 0{mn) algorithm? 

3. Grouped Submissions versus Uniformly Distributed, 
Sequential Submissions 

Earlier in this chapter, we discussed the method we used throughout our sim- 
ulation experiments to request jobs to be run by SmartNet in simulation mode. We 
requested one of five different jobs, one at a time, repeatedly, via a command file, 
for a total of either 125 or 500 jobs. The jobs that were requested were chosen in- 
dependently from a uniform distribution, so we call this method of choosing jobs the 
Sequential Method. We also described another method of requesting jobs, which we 
call the Grouped Method. Using the Grouped Method, jobs are requested in groups 
via the command file. Jobl could be requested to run 25 times, which would be equi- 
valent to requesting Jobl to run once, but list that request 25 times in a row in the 
command file. During the course of our experiments, we became interested in know- 
ing how schedules performed when jobs were requested with the Grouped Method as 
compared to their being requested in a random order using the Sequential Method. 
Specifically, we compared the performance of the Greedy Algorithm against the Fast 
Greedy Algorithm. We also varied, in other ways, the order in which the grouped 
jobs were requested in the command file, as we thought that may make a difference. 
We set up four command files, discussed below. In all cases, each request was chosen 
from the same group of 5 jobs. 

• 125-up: 125 jobs requested in increasing order jobl through job5, 25 repeti- 
tions of each job. 

• 125-down; 125 jobs requested in decrccising order job5 through jobl, 25 re- 
petitions of each job. 

• 500-up: 500 jobs requested in increeising order jobl through job5, 100 repeti- 
tions of each job. 

• 500-down: 500 jobs requested in decreaisiiig order job5 through jobl, 100 
repetitions of each job. 
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Figure 40 shows how much faster Greedy schedules executed than the Fast Greedy 
schedules when using the Grouped Method of job requests. As before, a positive 
percentage means that the Greedy schedule executes faster than the Fast Greedy 
schedule. 




Figure 40. Greedy versus Fast Greedy, Grouped Method. This figure shows how much 
faster schedules built by the Greedy Algorithm finish executing versus schedules built 
by the Fast Greedy Algorithm. Positive values mean that the Greedy schedule is 
executed faster than the Fast Greedy schedule. 



The results shown in Figure XXIII show significant differences between the 
two job request methods. The Sequential Method has Fast Greedy schedules complet- 
ing before Greedy schedules under High-Job, High-Machine Heterogeneity; however, 
the Grouped method has Greedy schedule executing almost 20% faster than the Fast 
Greedy schedule. We see a similar contradiction for High-Job, High-Machine, Con- 
sistent. 

Figure 41 shows that the performance of the Greedy Algorithm was not affected 
by the way that jobs were requested. For both the Grouped and Sequential methods, 
Greedy performed about the same. Figure 42 shows that the performance of the Fast 
Greedy Algorithm was slightly affected by the order in which jobs were requested. 
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Figure 41. Greedy Performance; Grouped and Sequential Methods. Greedy performed 
about the same for both the Grouped and Sequential Methods. 

4. Mixed Heterogeneity Matrices 

Previously in this chapter we discussed the characteristics of High-Job, High- 
machine Heterogeneity. We noted that the distribution of the variances of the columns 
(machines) in the matrix was unimodal, and that the average variance for both rows 
and columns was on the order of 10^°. 

We first thought that the magnitude of the Variance was a simple way to char- 
acterize the category of heterogeneity. It turns out that this is not the best way to 
mecisure heterogeneity. We consider Table VII. Table VII includes row and column 
variances. The average row and column variance is on the order of 10*°. If we use only 
these variances, we might conclude that this matrix represented a High-Job, High- 
Machine Heterogeneity matrix. However, the Icist five machines are all very similar, 
and have a variance of 79.3. In fact, the distribution the column variances is bimodal. 
One mode is around 79.3, while the other is on the order of 10'°. What the matrix in 
Figure VII represents is a High-Job, High-Machine Heterogeneity matrix combined 
with a Low-Job, Low-Machine Heterogeneity matrix. 
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Fast Greedy Algorithm Performance 

Grouped vs. Sequential Methods 
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Figure 42. Fast Greedy Performance, Grouped and Sequential Methods. Fast Greedy 
performed slightly worse for both the Grouped and Sequential Methods. 

When we ran our Baseline experiments on the Mixed Heterogeneity matrix in 
Table VII, we saw that both Greedy and Fast Greedy outperformed OLB and LBA 
by at least an order of magnitude. Recall that when we ran our Baseline experiments 
on our High-Job, High-Machine matrix, that Greedy and Fast Greedy performed 
similarly to LBA, while outperforming OLB. Also, when we ran the Baseline experi- 
ments on our Low-Job, Low-Machine Heterogeneity matrix. Greedy and Fast Greedy 
performed similarly to OLB. 

These results show that row and column variance of a matrix are not suit- 
able statistical characterizations of the categories of heterogeneity. In this thesis, we 
propose that the number of modes must also be considered. In this thesis, we primar- 
ily concentrate on matrices where both the row and column variances have only a 
single mode, ('onclusions concerning other matrices, where either the row or column 
variances have multiple modes is beyond the scope of this thesis. 
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Table VII. A Mixed Heterogeneity Matrix. The average row and column variance is 
on the order of 10*°. 



E. CONCLUSION 

This chapter has presented a considerable amount of detailed information about 
the ex[>eriments performed for this thesis. We explained the job distributions we chose 
to implement, as well as why we chose them. We also explained how we categorized 
lu'terogeneity. We presented our Baseline experiments and the results obtained, as well 
as the results from simulations where the jobs ran for times other than the predicted 
times. We examined how the Baseline results compared to the theoretical Best Case 
Ittm, and compared the performance of SmartNet’s Greedy Algorithm to its Fast 
Gree<ly .Algorithm, both when the job submissions were grouped as well as when 
th*‘\ were individually submitted. We found that SmartNet embodies algorithms that 
p*-rformed well in all cases and began work on determining which of SmartNet's 
srh<‘(lulers should be used for each of the various categories of heterogeneity. 
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VI. 



SUMMARY AND FUTURE WORK 



A. SUMMARY 

This thesis examined the effect of exponential and truncated Gaussian run-time 
distributions on the performance of SmartNet. In order to perform our experiments, 
we first had to enhance the original SmartNet simulator so that simulated job run- 
time durations could be non-deterministic. This non-deterministic behavior must be 
dictated by the type of run-time distribution that a specific job is designated as having. 
The result of this effort was a SmartNet simulator that behaves realistically within 
the bounds of the run-time distribution parameters we specified and implemented. 

With our enhanced version of the SmartNet simulator, we were able to begin 
our examination of SmartNet performance. We discovered early in our experiments 
that we first had to determine the categories of heterogeneity that we wanted to exam- 
ine. In addition, we needed a reference to which we could compare our results. These 
were our Baseline tests, which were tests of SmartNet designed such that the run- 
times did not differ from expected time to complete (ETC) values. The Baseline tests 
showed, for the specific categories of heterogeneity that were examined, the following 
results. 

• For High-Job, High-Machine Heterogeneity (Inconsistent), SmartNet’s O(mn^) 
Greedy and 0{mn) Fast Greedy scheduling algorithms performed comparable 
to the LB A Algorithm, a slightly less complex algorithm than either Greedy 
or Fast Greedy. Because of this similarity of performance, we determined that 
further examination of Greedy and Fast Greedy scheduling algorithm perform- 
ance Wcis not needed for this category of heterogeneity. 

• For High-Job, Low-Machine Heterogeneity (Inconsistent), SmartNet’s O(mn^) 
Greedy and 0(mn) Fast Greedy scheduling algorithms performed comparable 
to the OLB Algorithm, which is also a less complex algorithm than either 
Greedy or Fast Greedy. Additionally, OLB does not require the a priori in- 
formation that is required by all of the Greedy algorithms (including Fast 
Greedy) and the LBA algorithm. The overhead of the Greedy and Feist Greedy 
scheduling algorithms is not warranted for this category of heterogeneity. 
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• For Low-Job, High-Machine Heterogeneity (Inconsistent), SmartNet’s O(mn^) 
Greedy and 0{mn) Fast Greedy scheduling algorithms performed significantly 
better than both OLB and LB A. We determined that further study of Greedy 
and Fast Greedy performance was warranted for this category of heterogeneity. 

• For Low-Job, Low-Machine Heterogeneity (Inconsistent), SmartNet’s O(mn^) 
Greedy and 0{mn) Fast Greedy scheduling algorithms performed comparable 
once again to OLB. We determined that additional examination of Greedy 
and Fast Greedy scheduling algorithm performance was unwarranted for this 
category of heterogeneity. 

• For High-Job, High-Machine Consistent Heterogeneity, SmartNet’s O(mn^) 
Greedy and 0{mn) Fast Greedy scheduling algorithms once again performed 
significantly better than both OLB and LBA. We again determined that further 
study of Greedy and Fast Greedy performance was warranted for this category 
of heterogeneity. 

• For Low-Job, High-Machine Consistent Heterogeneity, SmartNet’s O(mn^) 
Greedy and 0{mn) Fast Greedy scheduling algorithms again performed signi- 
ficantly better than both OLB and LBA. We again, therefore, determined that 
further study of Greedy and Fast Greedy performance was warranted for this 
category of heterogeneity. 

With our focus on Low-Job, High-Machine Heterogeneity; High-Job, High-Machine 
Consistent Heterogeneity; and Low-Job, High-Machine Consistent Heterogeneity; we 
began our experiments comparing the performance of the various SmartNet scheduling 
algorithms when jobs did not run for the length of time predicted. First, we examined 
the performance of SmartNet when the distribution underlying all jobs executed was 
exponential. The tests showed that the schedules built by the best SmartNet al- 
gorithms were still much better than those built by the less complex, non-intelligent 
SmartNet algorithms. Not only does this show that re-scheduling is often not needed 
after the initial schedule has been somewhat violated, but also that the overhead in- 
volved in using SmartNet’s more intelligent algorithms is warranted even when the 
run-times of jobs can be significantly different from their predicted run-times. 

We next examined the performance of SmartNet when the distribution underly- 
ing all jobs executed was a truncated Gaussian run-time distribution. The ETC values 
of the jobs were used as the mean, and truncation occurred to the left at mean — a. 
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Variance for these tests was 300% of the mean. Our results show that SmaxtNet 
performance was somewhat affected by jobs whose run-times were from a truncated 
Gaussian run-time distribution. We saw up to a 25% increase in the time required to 
execute a schedule. Though much of this apparent decrease in performance was an 
artifact of our truncation method, some amount of it appears unaccounted for. This 
suggests that it may be necessary to recalculate a schedule for jobs that are still wait- 
ing to be executed when we have jobs with this run-time behavior. The relatively low 
cost of rescheduling may help minimize any resulting decrease in performance low. In 
this case, also, we see that the overhead involved in using SmartNet’s more intelligent 
algorithms is warranted even when the actual run-times of jobs can be significantly 
different from their predicted run-times. 

As we performed our experiments, we came across other related areas of Smart- 
Net performance that we were able to examine. First, we looked at the theoretical 
minimum execution time of a schedule and compared that theoretical minimum to 
the performance of the four scheduling algorithms we tested. Our results showed that 
SmartNet’s algorithms often approach the theoretical limits when running tests with 
our High-Job, Low-Machine; and Low-Job, Low-Machine categories of heterogeneity. 
In all other cases, the algorithms performed at least 100% worse than the theoretical 
minimum. We therefore conclude that, for our test environment, SmartNet was able 
to build near optimal schedules when the variation in performance of jobs on machines 
wcis low. 

Next, wc compared the performance of SmartNet’s O(mn^) Greedy and 0{mn) 
Fast Gre<*<ly .scheduling algorithms. We determined that the schedules built with the 
Greedy algorithm executed faster than those built with Fast Greedy for High-Job, 
Low-Machiiie ll<*terogeneity; Low-Job, High-Machine Heterogeneity; Low-Job, Low- 
Machine Heterogeneity; and Low-Job, High-Machine Consistent Heterogeneity. The 
performance gain was never more than 15%, however, when jobs were submitted in 
a random order. For all other categories of heterogeneity', schedules built by the 
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Fast Greedy scheduling algorithm completed faster than those built with Greedy. We 
determined that the cost to schedule with the more complex Greedy algorithm may 
not always outweigh the performance gain, and that such considerations needed to be 
further examined in future research. 

We then compared the performance of SmartNet’s intelligent schedulers when 
jobs were requested sequentially and randomly, which we called the Sequential Method, 
against the Grouped Method. Our results showed significant differences in the per- 
formance of the Greedy and Fast Greedy scheduling algorithms when these two meth- 
ods were used. We conclude that there is a need for both methods to be used within 
SmartNet, but that they need to be used appropriately. Further, the differences in 
performance between these two job request methods needs to be accounted for when 
deciding which scheduling algorithm to use. 

Lastly, we examined a Mixed Heterogeneity Matrix. While both the average 
row and column variance was on the order of 10^®, and so might have appeared to be 
a High-Job, High-Machine Heterogeneity matrix, a closer look at the distributions of 
the row and column variances showed us this matrix was very different. The distri- 
bution of the row and column variances for our first matrix was uni-modal, which we 
concluded was characteristic of the High-Job, High-Machine category of heterogeneity. 
However, the distribution of the column variances of the second matrix was bi-modal. 
We concluded that the existence of more than one mode meant that a matrix was 
actually a combination of two different matrices corresponding to two categories of 
heterogeneity — in this case, a High-Job, High-Machine matrix and a Low-Job, Low- 
Machine matrix. When we compared the results of the Baseline experiment for the 
Mixed Heterogeneity Matrix with our High-Job, High-Machine Heterogeneity matrix, 
we saw significant differences in the performance of the Greedy and Fast Greedy al- 
gorithms. These results helped us determine that categories of heterogeneity could not 
be statistically categorized by average row and column variance, but that additional 
statistical study was needed. 
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Overall, we determined that SmartNet’s algorithms perform well under the 
categories of heterogeneity we identified, and that additional research is needed to 
further pinpoint ways to increase performance in the many different computing and 
network environments likely to be found in the Department of Defense. 

B. FUTURE WORK 

There are numerous opportunities for future work related to this thesis. First, 
SmartNet performance needs to be further evaluated using additional matrices from 
the categories of heterogeneity that we identified as well as with additional examples 
of matrices with Mixed Heterogeneity. Additionally, the categories of heterogeneity 
most often found in typical environments needs to be further researched. SmartNet 
performance needs to be further examined when the jobs’ run-time distributions are 
different from the ones that we simulated. This creates a need for more study into what 
types of distributions we should expect to find in various high performance computing 
environments. Further, SmartNet performance should be evaluated when different 
jobs execute with different types of run-time distributions. The cost-effectiveness of 
SmartNet’s O(mn^) Greedy and 0{mn) Fast Greedy scheduling algorithms needs to 
be traded off against their performance, and the cost and benefits of rescheduling 
should also remain a consideration. 
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APPENDIX A. SMARTNET DATABASE 

FORMAT 



Tables VIII, IX, X, and XI outline the format of the SmartNet database. They 
include fields added because of research performed in this thesis. 



Site Object Fields 


Format 


site name 


search key string 


description 


string 


latitiude 


float, global coordinate 


longitude 


float, global coordinate 


bandwidth 


float, in bytes/second (within site) 


latency 


float, in seconds (within site) 


notional 


integer, 1 or 0 (true or false) 


status 


integer (unused at this time) 



Table VIII. Site Object Database Format 



Machine Object Fields 


Format 


machine name 


search key string 


architecture 


string (unused at this time) 


IP address 


standard internet dot notation 


description 


string 


location 


string 


relative cost 


float (unused at this time) 


relative performance rate 


float (unused at this time) 


Is the machine notional? 


integer, 1 or 0 (true or false) 


site name 


string 



Table IX. Machine Object Database Format 
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Model Object Field 


Format 


model name 


search key string 


description 


string 


idempotent 


integer, 1 or 0 (true of false) 


The number of compute characteristic 
description lines 


integer 


compute characteristic’s descriptions, 
one line per description 


string 



Table X. Model Object Database Format 



Model-Machine Object Fields 


Format 


machine name 


search key string 


model name 


search key string 


group name 


search key string 


distribution type 


string (ARMSTRONG ADDED) 


first moment 


float (ARMSTRONG ADDED) 


second monent 


float (ARMSTRONG ADDED) 


third moment 


float (ARMSTRONG ADDED) 


theoretical compute function 


equation, producing seconds 


theoretical network function 


equation, producing seconds 


theoretical data use function 


equation, producing bytes 


theoretical floating-point function 


equation, producing Mflops 


relative execution rate 


float (unused at this time) 


experiential compute data written to database by smartnet 


The number of compute 
characteristic description lines 


integer 


compute characteristic’s descriptions, 
one line per description 


string 


experiential network data written to databaise by smartnet 



Table XI. Model-Machine Object Database Format 
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APPENDIX B. ENHANCEMENTS MADE TO 
EXISTING SMARTNET CODE 



1. INTRODUCTION 

This appendix provides detailed explanations of the changes made to several 
SmartNet files. The changes were made in order to enhance the SmartNet simulator. 
Chapter IV provides an explanation as to why these changes were required. 

2. SERVER/SIMULATOR/JOBSTARTEVENT.CC 

This file details the member functions of the JobSt art Event class. There are 
only two functions to this clciss: a constructor, and the function execute (). The 
execute () function does several things, but only one thing that we are interested in 
changing. The duration that a job is to run in simulation mode was retrieved from 
the ETC information provided in the input database. This is where the crux of the 
problem with the simulator lay. The duration retrieved is the exactly the same as the 
ETC value that the schedule was built from. The function call was: 

• job duration = ETC of job provided in database 
We changed the above call to: 

• job duration = run-time of job calculated from distribution data 

The distribution data is provided in the database file (another change). The function 
required to calculate the job run-time, based upon this distribution information, is an 
addition to tin* SmartNet simulator code. 
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3. SERVER/SN-LOG/SN-LOG.C 

This file is the program code for the SmartNet logger, which listens to various 
SmartNet messages and logs specific detail to an output file. This output log file can 
then be used to recreate SmartNet runs using the SmartNet replay mechanism. In 
the case of the SmartNet simulator, the logger is used to capture run-time and for 
scheduling information for later evaluation of SmartNet’s performance and behavior. 

There were minor enhancements made to this file, but they were important. 
We found that the code was not outputting the correct time for the duration that a job 
was running in simulation mode. The same was true for the times recorded for jobs to 
begin. This stemmed from the use of the ETC value for both scheduling and running 
jobs in simulation mode. The changes made involved altering variable accesses in the 
following functions; 

• JobNoticeStart: access true start time versus time variable t 

• JobUpdateDone: report true finish time/duration vice time variable t 
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4. SN-SUBMIT/EXTERNAL.C 

This file contains external interface code specific to the sn-submit program. The 
sn-submit program must be run to actually submit jobs to SmartNet via command 
line. Command line submission must be used in simulation mode, because SmartNet’s 
graphical user interface does not support simulation mode. 

While investigating necessary changes to the SmartNet code, it became evident 
that sn-submit was trying to actually start the schedule on the prescribed machines. 
This needed to be fixed in order for the simulator to actually be a simulation tool. 
W’e fixed this problem by checking to see whether simulation mode had been set when 
smeirtnet-master was started. The check for simulation mode was performed in the 
sn_external_start 0 function, and was performed as follows: 

• If simulation mode is set, return true. 

The change allowed sn-submit to run in simulation mode without attempting to actu- 
ally start the schedule. 



103 



5. SN-SUBMIT/SUBMIT.C 

After examining and changing the sn-submit /external. C file, it became evident 
that we needed to be able to start sn-submit in simulation mode. The file submit . C 
contains the main program for the sn-submit application. We needed to add simula- 
tion functionality at the command line. Simulation functionality included being able 
to use "-S" as a command line argument to sn-submit. It also included setting 
the simulation mode global variable to true. We added the equivalent of the following 
pseudo-code. 

• Global Integer Variable simulationMode = false; 

• If sn-submit includes -S as a command line argument, 

— Set simulationMode to true; 

— Remove -S from the input argument list; 
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6. SN-SUBMIT/README 

This file includes detailed information on how to run sn-submit. We changed 
the README file to include information about the "-S" flag, thus informing the 
user how to run sn-submit in simulation mode. 
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7. SERVER/SRC /MODELMACHINE.H 

This is the header file for the ModelMachine class. The ModelMachine class 
handles all of the characteristics of individual job-machine pairs. Much of the data is 
provided via the input database file. The format of the database file is included in 
Appendix A. 

Runtime distribution information is necessary for each individual model-machine 
pair. In order for the user to specify this information (for experimental purposes), 
the run-time distribution information had to be read into SmartNet with the model- 
machine data. That meant altering the database file format to account for the run-time 
distribution data. Altering the database file format meant having to provide variables 
to hold the run-time distribution data, along with the functions necessary to retrieve 
and manipulate those variables. All of the run-time distribution variables and func- 
tions are first seen in ModelMachine. h. The changes made to this file are discussed 
below. 

Because we referenced specific distribution function information, the distribu- 
tion. h header file, written for this research and discussed later in this chapter, had to 
be included. We then added the class data members to hold the run-time distribution 
information. These data members were, of course, private. They include: 

• Distribution: an Mstring type 

• Moment _1: a float to hold the mean, or first moment 

• Moment _2: a float to hold the second moment 

• Moment 3: a float to hold the third moment 

Public data member accessor functions were then declared. These functions include: 

• getDistributionO: returns Distribution 

• getMoment_l(): returns Moment_l 

• get Moment _2(): returns Moment .2 

• getMoment^O: returns Moment J3 
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The above member function definitions were included as inline functions listed after 
the class definition. They are simple accessor functions that return the value of the 
individual data members. 

A method had to be written that would allow a run-time duration to be gen- 
erated based upon the new run-time distribution data members. By including it in 
the ModelMachine class, we had easy access to the necessary data. Also, when the 
actual duration is requested (see server/simulator/JobStartEvent . cc above) it 
is accessed via a reference to a ModelMachine type. We added the public member 
function getRuntimeO to provide calculation of the run-time duration. 
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8. SERVER/SRC /MODELMACHINE.CC 

This file contains member functions of the ModelMachine class. The class is 
defined in ModelMachine . h, discussed previously. Additions made to ModelMachine . cc 
include the following. 

1. ModelMachine: : initO; Added initialization of run-time distribution data 
members: 

• Distribution = “ “ 

• Moment-1 = 0.0; 

• Moment.2 = 0.0; 

• Moment-3 = 0.0; 

2. ModelMachine: :operator= (ModelMachine &mm): Added assignment overload- 
ing for run-time distribution data members: 

• Distribution = mm. Distribution 

• Moment_l = mm.MomentJ. 

• Moment-2 = mm. Moment-2 

• Moment .3 = mm. Moment -3 

3. ModelMachine: :getRuntime() : This function was added to allow for the com- 
putation of the run-time duration. It returns duration, a DeltaTime type. The 
functions generate_normal() and generate.exponential were written for 
this research. They are defined in the file distribution .h, which is included 
in this file and discussed later in Appendix C. Here is the pseudo-code. 

• If Distribution is equal to “normal” 

— duration = generate.normal(Moment_l, Moment-2) 

• Else If Distribution is equal to “exponential” 

— duration = generate-exponential(Moment J ) 

• return duration 

4. ModelMachine: :rehd(): This function needed to be altered to allow for the 
run-time distribution information to be read in from the database file. 
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APPENDIX C. ADDITIONAL CODE FOR THE 
SMARTNET SIMULATOR 



1. INTRODUCTION 

The following subsections contain detailed information about the code we wrote 
specifically for improving the SmartNet simulator. Each explanation is followed by 
the actual code added to the SmartNet simulator. 



2. SERVER/ ARMSTRONG /MAKEFILE 

The files that were written needed to be compiled with the SmartNet package. 
This meant creating a Makefile consistent with the Makefile structure resident in the 
SmartNet code. This file allows for all of the files below to be compiled whenever the 
server is recompiled. Here is the code; Makefile 

• Makefile for Armstrong's Thesis Code 

• Used to generate Random Variates for 

• use by the SmartNet simulator 

• (last mod: 970518) 

• Note that comments start with # for this file 

t which compiler to use 
CC « g++ 

•CC « CC 

• Directory location of include files 
•INCS « -I-L/local/lang/SC2.0.1 

INCS - 

CFLAGS « $(INCS) -g 

• What libraries need to be linked 
•LIBS « -Im 

LIBS « 

• Project name to be compiled 
PROGS « 
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# What object files are to be used 

OBJS = distribution.o random_generator .o myrand.o 

.SUFFIXES: .c .cc .o 
.c.o:; cc $(CFLAGS) -c $*.c 
.cc.o:; $(CC) $(CFLAGS) -c $+.cc 

# What is to be compiled 
#all : $ (PROGS) 

all : $(0BJS) 

# The main object file 
#mytest: $(0BJS) 

# $(CC) -o mytest $(0BJS) $(CFLAGS) $(LIBS) 

# Note — there is a tab before the $(CC) above 

# What are the objects are dependent on 
#main.o: main.cc proj2.h 

#proj2.o: proj2.cc proj2.h 

#main.o: main.cc myrand.h random.generator.h distribution.h 

distribution.o: distribution.ee myrand.h random_generator .h 

random_generat or . o : random_generator . cc random.generat or . h 

myrand . o : myrand . cc myrand . h 

# This cleans out everything except the Makefile, 

# AAAREADME and source files 
clean:; rm -f $ (PROGS) *.o core 



no 



3. SERVER/ ARMSTRONG/MYRAND.H & MYRAND.CC 

The myrand files define a function that uniformly generates random numbers 
between 0 and 1. The uniform, randomly generated, number is used by later functions 
to access an array of 100 seeds that will assure high periodic randomization of numbers 
in another uniform random number generator. The pseudo-code of the myrand () 
function follows. 

• static int check = false 

• Use system time to seed the system random number generator 

• If check is false 

— static tester = time(NULL) 

— check = true 

• Seed the system random number generator with tester 

• ix = system random number generator output 

• answer = ix/(max random number capable of being generated) 

• tester = ix 

• return answer 

The concept is for time to be used to first seed the random number generator. All 
subsequent calls to this function will use the previously generated variable as the seed 
because its location is kept intact via the static type. The reason the static typing 
was done is because there could possibly be several accesses of the myrand () function 
within a single second. Always using time for the seed would cause the same seed to 
be used for several myrand () calls. Here is the code: myrand. h 

// File: myrauid.h 

// Bob Armstrong 
// 12 March 1997 

// This function randomly generates numbers between 
// 0 and 1 
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#include <iostream.h> 

#include <stdlib.h> 

#include <math.h> 

#include <time.h> 

typedef int bobint ; 

float myrandO; 

myrand.cc 

// File: myrand.cc 

// Bob Armstrong 
// 12 March 1997 

// This function uniformly generates random numbers between 
// 0 and 1 

#include "myreind.h" 

#include <debug/Debug.h> 

float myrandO 

{ 

long double ix; 
static long int tester; 
static bobint check = 0; 

// I am seeding the random function with the time 
sraiidC (int) time (NULL) ) ; 

if ( ! check) { 

tester = time (NULL) ; 

//tester = 867875440; // used to provide data consistency in testing 

check = 1 ; 

if (Debug: : check ("al")){ 

Debug:: out 0 « "Initial seed (time):\t" << tester << endl ; ; 

} 

} 

srand( (unsigned int) tester); 

ix = randO ; // maie this the next time seed, 

float answer = ix/ (RAND.MAX) ; 
tester = (long int)ix; 
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if (Debug: : check("a5") ){ 

Debug: : out 0 << "Output of myr 2 Lnd:\t"<< answer << endl ; 

} 

return answer; 
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4. SERVER/ ARMSTRONG /DISTRIBUTION.H & DISTRI- 
BUTION.CC 

The distribution files include most of the functions needed to generate the 
various run-time distributions. Functions include normal_01(), generate_normal(), 
and generate_exponential(). 

The function normal_01() uses the polar method for generating normal (0,1) 
random variates. It has no parameters, but returns a single normally distributed 
random variate. This function is used by the generate_normal() function to generate 
Gaussian data based upon the first and second moments. 

• While WW is greater than 1.0 

- uni form. random. number.! is a Uniform(0,l) random number 

— uniform.random.numberJl is a Uniform(0,l) random number 

— V! =2 uni f or m. random. number.! — 1 

— V2 = 2 uni f or m. random. number J1 — 1 

- WW =V!'^ ^■V2'^ 

• End While 

_ / -2log(tVW) 

• r / — V WW 

• random. variate A = VV FI 

• random. variateJl = YY V2 

• Return either random. variate A or random. variate^ 

The generate_exponential() function receives the first moment and returns 
a run-time duration. As explained in Chapter III, the Inverse Transform method 
is used to generate these exponentially distributed variates, because the exponential 
function, and its inverse, have a closed form. 

• Define EX PON E NTI AL.RU NT I M E 

• While duration is less than or equal to 0.0 

— seed = 99 myrand{) 
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- random .number = random.generator(^seed) 

- If first moment is greater than the EX PONENTI AL.RUNTI M E 

* adjusted = first.moment - EXPON ENT I AL.RU NTI M E 

* duration = —EXPON ENT I AL.RU NTI M E \o^{random. number) 

* duration = duration + adjusted 
— Else 

* duration = —first. moment \og{random. number) 

- End If/Else 

• End While 

• Return duration 

In this function, the constant EXPON ENTI AL.RU NTI ME is a mean gathered 
via experiments with the NAS Benchmarks which is applied to the first moment data 
specific to the machine/job pair. It is discussed in greater detail in Chapter V. 

The generate_normal() function receives the first and second moment as its 
parameters and returns a run-time duration. This function calls the normal _01 () func- 
tion, which generated IID N(0,1) random variates. Implementation of the normal_01 () 
function is simple, as shown in the following pseudo-code. 

• XX = normal.0l{) 

• duration = 0.5 -|- first.moment -f {XX Vsecond.moment) 

• Return duration 

The 0.5 is added to the duration computation to account for rounding errors. This 
function can be changed to generate truncated normal data by only allowing the 
duration to be returned if it falls within some limit imposed in the code. That limit 
may either be hard coded, or it may be dependent upon a constant relationship 
between the first and second moments, which is probably more realistic. The use 
of truncated Gaussian is discussed further in Chapter V. The code for these function 
is included below, distribution. h 

// File: distribution.h 
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// Bob Armstrong 
// 4 August 1997 
// Thesis code 

// This code determines which type of distribution 
// the model-machine object carries with it and 
// generates a run-time based upon that distribution. 

#include <math.h> 

#include <string.h> 

#include "myrand.h" 

#include "random_generator .h" 

#include " /users/work3/rkcirmstr/S0LARIS/ src/ sn/lib/ spi/DeltaTime . h" 
double normal_0l(); 

DeltaTime generate_normal (float , float); 

DeltaTime generate_exponential (float) ; 

distribution.ee 

// File: distribution.ee 

// Bob Armstrong 
// 4 August 1997 
// Thesis code 

// This code determines which type of distribution 
// the model-machine object carries with it and 
// generates a run-time based upon that distribution. 

#include "distribution .h" 

#include <debug/Debug.h> 



/* This IS the polax method of generating normal 
random variates, discussed in Law and Kelton 
"Simulation Modeling and Analysis", pp 490 - 492. 

♦/ 

double normal.OlO 

double random. variate; 
double vl, v2, yy, ww = 2.0; 
int seed; 

float random_number_l , random. number. 2; 
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while(ww > 1.0) { 

seed = int(99 * myrandO); 

if (Debug: : check("a2") ){ 

Debug; : out () << "Seed in normal_01() :\t"<< seed << endl; 

} 

random_number_l = rcUidom_generator(seed) ; 
random_number_2 = random_generator(seed) ; 



vl = 2 * random_number_l - 1; 
v2 = 2 * random_number_2 - 1 ; 

ww = vl * vl + v2 * v2; 

} 

yy = sqrt( (-2 * log(ww)) / ww) ; 

// Decide which value to return 
ifCmyraindO > 0.5) { 

random.variate = vl * yy; 

} 

else { 

rcLndom.variate = v2 * yy; 

} 

if (Debug; :check("a4")){ 

Debug: :out() << "Random Variate produced by normal _01 (); \t" 
<< random.vciriate << endl ; 

} 

return raindom.variate ; 



} 

DeltaTime generate_normal (float moment_l, float moment_2) 



DeltaTime duration; 

double XX ; 

int checker = 0 ; 
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double sigma = sqrt ( (double)moment_2) ; 



if (moment _2 == 0.0) { 
duration = moment_l; 

} 

else { 

while(checker == 0) { 

XX = normal_01(); 

duration = (0.5 + moment_l + sigma * xx) ; 

if((duration > 0.0) && (duration >= moment_l - sigma)) { 
checker = 1 ; 

} 

} // end while 

} // end else 
return duration; 



DeltaTime generate.exponential (float moment_l) 



{ 

int seed; // 
DeltaTime duration = -100; // 
float adjusted; // 
float random.number; // 



const float EXP.RUNTIME = 3.0; // 

// 



holds seed for random.generator 

returned variable 

moment. 1 adjusted for EXP.RUNTIME 

holds random.generator 0 value 

exponential mean; CHANGE THIS 

to adjust exponential characteristics. 



// Only return a runtime duration > 0. 

// Everything takes SOME time to run! 
while(duration <= 0){ 

/ / Get seed and generate random number 
seed = int (99 * myrandO); 
random.number = random.generator(seed) ; 
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// If moment_l is greater than the runtime value, 

// subtract moment_l and compute the duration from 
if (moment. 1 > EXP.RUNTIME) { 

adjusted = moment. 1 - EXP.RUNTIME; 

duration = (int) (- EXP.RUNTIME ♦ log(random.number) ) ; 
duration += (DeltaTime) adjusted; 

} else ■( 

duration = (int)(- moment. 1 ♦ log(random.number)) ; 

} 

} // end while 
return duration; 

} 
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5. SERVER/ARMSTRONG/RANDOM.GENERATOR.H & 
RANDOM_GENERATOR.CC 

This file contains the functions necessary to generate uniformly distributed IID 
U(0,1) random variates. This function is needed by the normal_01(), generatejiormalO, 
and generate_Gxponential() functions found in the distribution files. As has been 
previously discussed, a good source of IID U(0,1) random variables is essential to 
the success of any random generator. The following code can be found written in 
“Simulation Modeling and Analysis,” by Law and Kelton. [Ref. 13, pages 454-456] It 
is also included below, random-generator. h 

/* The following 3 declarations are for use of the random-number 

generator rand and the associated functions reindst eind reandgt for 
seed management. This file (named reindom_generator . c) should be 
included in any program using these functions by executing 
#include "random_generator.h" 
before referencing the functions. 

*! 

float random_generator (int stream); 
void randst(long zset, int stream); 
long randgt(int streaim) ; 

random-generator, cc 

/* File: random_generator.cc 

UNIFORM (0,1) RANDOM NUMBER GENERATOR • 

Stolen by: Bob Armstrong from "Simulation 
Modeling and Analysis", by Law and Kelton ♦/ 

/♦ Prime modulus multiplicative linear congruential generator Z[i] = 
(630360016 * Z[i-l]) (mod{pow(2, 31) - 1)), based upon Marse and 
Roberts' portable FORTRAN random-number generator UNIRAN. Multiple 
(100) streams are supported, with seess spaced 100,000 apeirt. 
Throughout, input airgument "stream" must be an int giving the 
desired stream input number. The header file random_generator . h 
must be included in the calling program (#include 
"random_generator .h") before using these functions. 
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Usage: (three functions) 

1. To obtain the next U(0,1) random number from stream "stream," 
execute u = random_generator (stream) ; 

where rajid is a float function. The float variable u will contain the next 
raindom number. 



2. To set the seed for stream "stream," to a desired value zset , 
execute randst(zset, stream); 

where rajidst is a void function and zset must be a long set to 
the desired seed, a number between 1 and 2147483646 (inclusive). 

Default seeds for all 100 streams are given in the code. 

3. To get the current (most recently used) integer in the sequence being 
generated for stream "stream" into the long variable zget , 

execute zget = raindgt (stream) ; 
where reindgt is a long function. */ 



iinclude <iostream.h> 
•include <debug/Debug.h> 

/* Define the constants. */ 



•define MODLUS 2147483647 
•define MULTI 24112 

•define MULT2 26143 



/• Set the default seeds for all 100 streams. ♦/ 



static long zrngU = 



{ 0 , 

1973272912, 

913566091, 

824064364, 

233217322, 

762430696, 

336157058, 

68911991, 

1774806513, 

1351423507, 

243649545, 

498067494, 

536444882, 



281629770, 

246780520, 

150493284, 

1911216000, 

1922803170, 

1432650381, 

2088367019, 

2132545692, 

1645973084, 

1004818771, 

2087759558, 

1663153658, 



20006270, 

1363774876, 

242708531, 

726370533, 

1385516923, 

1120463904, 

748545416, 

2079249579, 

1997049139, 

773686062, 

493157915, 

855503735, 



1280689831, 

604901985, 

75253171, 

403498145, 

76271663, 

595778810, 

622401368, 

78130110, 

922510944, 

403188473, 

597104727, 

67784357, 



2096730329, 

1511192140, 

1964472944, 

993232223, 

413682397, 

877722890, 

2122378830, 

852776735, 

2045512870, 

372279877 , 

1530940798, 

1432404475, 



1933576050, 

1259851944, 

1202299975, 

1103205531, 

726466604, 

1046574445, 

640690903, 

1187867272, 

898585771, 

1901633463, 

1814496276, 

619691088, 
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119025595, 

1142483975 

1923011392 

190641742, 

364849192, 



880802310, 

2026948561 

1433700034 

1645390429 

2049576050 



176192644, 

1053920743 

1244184613 

264907697, 

638580085, 



1116780070, 
786262391, 
1147297105, 
62038953, 
547070247 }; 



277854671, 

1792203830 

539712780, 

1502074852 



1366580350, 

1494667770, 

1545929719, 

927711160, 



/♦ Generate the next random number. */ 



float random_generator (int stream) 

{ 

long zi, lowprd, hi31; 



if (Debug: : check ("a6")){ 

Debug: ;out() << "Seed into random_generator : \t" << zrng[stream] << endl; 

} 



zi = zrng [streaun] ; 

lowprd = (zi & 65535) * MULTI; 

hi31 = (zi >> 16) * MULTI + (lowprd >> 16); 

zi = ((lowprd & 65535) - MODLUS) + ((hi31 & 32767) « 16) + (hi31 » 15); 
if(zi < 0) { 
zi += MODLUS; 

} 

lowprd = (zi &65535) * MULT2; 

hi31 = (zi >> 16) * MULT2 + (lowprd >> 16) ; 

zi = ((lowprd & 65535) - MODLUS) + ((hi31 & 32767) « 16) + (hi31 » 15); 
if(zi < 0) { 
zi += MODLUS; 

} 

zrng [stream] = zi ; 
if (Debug: :check("a3") ){ 

Debug: :out() << "Output from random_generator : \t" 

<< ((zi » 7 I 1) + D/16777216.0 « endl; 

} 

return ((zi >> 7 I 1) + 1)/16777216.0; 

} 



/* Set the current zrng for streaim "stream" to zset. */ 
void raindst (long zset, int streaim) 
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zrngCstreajn] = zset; 



} 



/* Return the current zrng for stream "stream" */ 
long randgt(int stream) 
return zrng [stream] ; 

} 
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APPENDIX D. CODE FOR RUNTIME 
DISTRIBUTION TESTS 



1. CODE FOR COUNTING SORT 

The following code is my implementation of the NAS Integer Sort Benchmark. 
It is written to be run on an SGI machine with four processors. The sorting algorithm 
used is a parallel version of the counting sort. The code also includes a non-parallel 
version of counting sort, which was run to provide a comparison for speedup. 



File: p2irallel4.c 

Name : Bob Armstrong 

Purpose: This file contains functions executed in the main 

procedure that measurement of the counting sort 
executed in sequence on one processor, in sequence 
forked to one processor, and in parallel forked 
to four processors. The code is written for the 
SGI Challenge L. Measurements are taiken and output 
to three files (one for each treatment) for each of 
ten runs of the sort. 



The code is not to the NFS style guide (sue me) . 



****+**+*♦****♦♦*♦%***♦*********♦**♦********♦♦*♦**♦**♦♦**♦**♦/ 



#include 

#include 

#include 

^include 

#include 

#include 

#include 

#include 

#include 

#def ine 
#def ine 
#def ine 
#def ine 



<stdlib.h> 




<stdio .h> 




<ulocks .h> 




<unistd.h> 




<stddef .h> 




<sys/types .h> 




<f cntl . h> 




<sys/mmaii . h> 




<sys/ syssgi .h> 




T0TAL_KEYS_L0G_2 


22 


MAX_KEY_L0G_2 


11 


TOTAL .KEYS 


(1 


MAX.KEY 


(1 



« T0TAL_KEYS_L0G_2) 
« MAX_KEY_L0G_2) 
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ttdefine CYCLE_C0UNTER_IS_64BIT 1 



#if CYCLE_C0UNTER_IS_64BIT 

typedef unsigned long long iotimer_t; 

#else 

typedef unsigned int iotimer.t; 

#endif 

/+ These are globals to meike the arrays, which are accessed 
randomly, available to all functions. This decreases 
the time spent passing pointers. 

*/ 

int key.array [TOTAL.KEYS] ; 
int work_array [MAX.KEY] ; 
int final_array[TOTAL_KEYS] ; 

/* This is the LOCK stuff. */ 
usptr.t* handle = NULL; 
ulock_t lock. array [MAX_KEY] ; 

/* These are globals to hold the values in work.axray 
after the tallys are done in parallel. They 
need to be globals because I can only pass 6 parameters 
in the m.fork call. 

*/ 

int datal = 0; 
int data2 = 0; 
int data3 = 0; 



/♦ This is Pedro Tsai's way cool precision timer for SGI machines. 
It was originally written in C++. With MINOR changes, it is 
included here to compile as C code. The units returned by the 
gethrtimerO function are picoseconds. Thanks, Pedro! 

♦/ 

unsigned int cycleval; 

volatile iotimer.t +iotimer_addr ; 
static int initflag=0; 

volatile iotimer.t* initSysTimerO 
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psunsigned_t phys_addr, raddr; 

int fd, poffmask; 



if ( initflag==0 ) 

{ 

poffmask = getpagesizeO - 1; 

phys.addr = syssgi(SGI_QUERY_CYCLECNTR, &cycleval) ; 



raddr = phys_addr & "poffmask; 
fd = open("/dev/mmem" , O.RDONLY) ; 

iotimer_addr = (volatile iotimer_t *)mmap(0, poffmask, PR0T_READ, 
MAP.PRIVATE, fd, (off_t) raddr) ; 

iotimer.addr = (iotimer.t ♦)(( psunsigned_t) iotimer.addr + 

(phys_addr & poffmask)); 

initf lag=l ; 

} 

return iotimer_addr; 

} 

/* get the hardware counter value *! 
long long gethrtimeO 
{ 

volatile iotimer_t *timer_addr; 
long long counter_value; 

/* Initialize the hardware time counter */ 
timer_addr=initSysTimer() ; 

counter_value=*timer_addr; 
return counter_value; 



} 



/* 

* FUNCTION RANDLC (X, A) 

* 

* This routine returns a uniform pseudoramdom double precision number in the 

* range (0, 1) by using the linear congruential generator 

* 
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* x_{k+l} = a x.k (mod 2"46) 

♦ 

* where 0 < x_k < 2"46 and 0 < a < 2"46. This scheme generates 2^44 numbers 

* before repeating. The argument A is the same as ^a" in the above formula, 

* and X is the same as x_0. A and X must be odd double precision integers 

* in the range (1, 2"46) . The returned value RANDLC is normalized to be 

* between 0 and 1, i.e. RANDLC = 2" (-46) * x_l . X is updated to contain 

* the new seed x.l, so that subsequent calls to RANDLC using the sajne 

* arguments will generate a continuous sequence. 

* 

♦ This routine should produce the same results on any computer with at least 

* 48 mantissa bits in double precision floating point data. On Cray systems 

* double precision should be disabled. 

♦ 

* David H. Bailey October 26, 1990 

* 

* IMPLICIT DOUBLE PRECISION (A-H, 0-Z) 

* SAVE KS, R23, R46, T23, T46 

* DATA KS/0/ 

* 

* If this is the first call to RANDLC, compute R23 = 2 “ ~23, R46 = 2 “ -46, 

* T23 =2 ~ 23, and T46 = 2 ~ 46 . These are computed in loops, rather than 

* by merely using the ** operator, in order to insure that the results are 

* exact on all systems. This code assumes that 0.5D0 is represented exactly 
*/ 



/************ 51c*** *********************:*:********** ***********:*: *♦***/ 

/************* RANDLC ************/ 

/************* ************/ 
/************* portable random number generator ************/ 

/**********************************************5)C + **********5t'=l‘*5|C***/ 

double randlcCX, A) 
double *X; 
double *A: 

{ 

static int KS; 

static double R23, R46, T23, T46; 

double Tl. T2, T3, T4; 

double A1 ; 

double A2; 
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double XI; 
double X2; 
double Z; 
int i , j ; 

if (KS == 0) 

{ 

R23 = 1.0; 

R46 = 1.0; 

T23 =1.0; 

T46 = 1.0; 

for (i=l; i<=23; i++) 

{ 

R23 = 0.50 + R23; 
T23 = 2.0 ♦ T23; 

} 

for (i=l; i<=46; i++) 

R46 = 0.50 ♦ R46; 
T46 = 2.0 ♦ T46; 

> 

KS = 1; 



/* Break A into two pcirts such that A = 2*23 ♦ A1 + A2 and set X = N. */ 

T1 = R23 * *A; 
j = Tl; 

Al = j; 

A2 = *A - T23 ♦ Al; 

/* Break X into two parts such that X = 2*23 * XI + X2, compute 
Z = Al * X2 + A2 * XI (mod 2*23), and then 

X = 2*23 ♦ Z + A2 * X2 (mod 2*46). ♦/ 

Tl = R23 * *X; 
j = Tl; 

XI = j: 

X2 = *X - T23 ♦ XI; 

Tl = Al * X2 + A2 ♦ XI; 
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j = R23 * Tl; 

T2 = j; 

Z = T1 - T23 ♦ T2; 

T3 = T23 * Z + A2 * X2; 
j = R46 * T3; 

T4 = j; 

♦X = T3 - T46 * T4; 
return (R46 * *X) ; 

} 

/* end randlc(x.a) */ 

! 3|c + + + + + :4c 3*c + + + 3*c + :*::4c:t:3*c3*c + f 

/* + ***:4c:*c:4c*+:4c:4c:|c CREATE_SEQ * :*c + *s*c * * :*c :*c / 

/ :4c :fc :fc 4c 3fc :4c :4: :4c If: :fc :4c :4c :4c :f: :4c :4c :4c :4c :4c :4c :f: :4c :4c :4c :4c :4c :f: :4c :4c :4c * :4c :4c :4c 3fc :4c i4c i(c :4c i(c 3«c :4c 3«c :4c **:»:******** :4c ** / 

/* This function creates the sequence of keys that will be sorted 
by calling the random number generator previously explained 
in this file. It is stored in key_array. 

*/ 

void create_seq( double seed, double a ) 

{ 

double x; 
int i, j , k; 

k = MAX_KEY/4; 

for (i=0; KTOTAL.KEYS ; i++) 

{ 

X = randlc(&seed, &a) ; 

X += randlc(&seed, &a) ; 

X += ramdlcC&seed, &a) ; 

X += randlc(&seed, &a) ; 

key_array[i] = k*x; 

} 

} 

************************ 

/******************** C0UNTING_S0RT *******♦**♦*******************/ 
/*****************************************************************/ 

/* This function is used to fill the final.array with the sorted keys. 
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It is not the entire counting sort algorithm. 

*/ 

void count ing.sort (begin, end, divl, div2, div3) 

int begin; 

int end; 

int divl; 

int div2; 

int div3; 

{ 

int ix, aa; 

for(ix = end-1 ; ix > begin - 1; ix — ) { 
aa = key.array [ix] ; 
f inal_eirray [work.array [aa] -1] = aa; 
work.array [aa] — ; 

} 

return; 

} 

/ ♦**♦*♦**♦*♦*♦♦*♦♦*♦♦♦*♦=)=♦**♦*♦*♦*♦+*♦♦**♦♦**♦*♦♦*♦******♦****♦***/ 

/ ***♦*♦*♦**♦**♦*♦♦*♦♦**♦**♦ D0_S0RT *♦*♦♦*♦♦*♦*****♦*♦***♦**♦**♦**/ 

/***:fc:tc:f:i|c*****i|c*******4;*:f:it;**:t;*******:4cit;:tc*lt;:tc*:t;*:<c*l|c**:|c*:4:*****:tc*****«**/ 

/* This function, like counting_sort above, only fills the final.array 
with the sorted keys. It is a function meant to be called with 
the m_fork() function call specific to SGI machines. 

*/ 

void do_sort(divl , div2, div3, ddl, dd2, dd3) 

int divl; 

int div2; 

int div3; 

int ddl; 

int dd2; 

int dd3; 

{ 

int ix, iy, iw, iz, aa, ab, ac, ad, 
wa, wb, wc, wd; ■ 
ulock_t* ba,* bb,* be,* bd; 

if (m_get_myid() == 0) { 

for(ix = divl-1 ; ix > -1; ix — ) { 
aa = key_ajrray [ix] ; 
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while(ustestlock(lock_ array [aa] ) ) { 

} 

ussetlock(lock_array [aa] ) ; 
wa = work_ array [aa] ; 
if(aa < ddl) { 
final .array [wa-1] = aa; 

}else if(aa < dd2) { 
f inal.array [wa + datal - 1] = aa; 

}else if(aa < dd3) { 

f inal_ array [wa + data2 + datal - l] = aa; 

}else { 

f inal_ array [wa + data3 + data2 + datal - 1] = aa; 

} 

work. array [aa] — ; 
usunsetlock(lock.array[aa]) ; 

} 

}else if (m.get.myidO == 1) { 

for(iy = div2-l ; iy > divl-1; iy — ) { 
ab = key.array [iy] ; 
while (ustestlock(lock.array[ab] ) ) { 

} 

ussetlock(lock.array [ab] ) ; 
wb = work.array [ab] ; 
if(ab<ddl) { 
f inal. array [wb-l] = ab; 

}else if(ab<dd2) { 
final. array [wb + datal - 1] = ab; 

}else if(ab<dd3) { 

final. array [wb + data2 + datal - 1] = ab; 

}else { 

f inal. array [wb + data3 + data2 + datal - 1] = ab; 

} 

work.array [ab] — ; 
usunsetlock(lock.array[ab]) ; 

} 

}else if (m.get.myidO == 2) { 

for(iz = div3-l ; iz > div2-l; iz — ) { 
ac = key.array [iz] ; 
while (ustestlock(lock.array[ac] ) ) { 
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} 

ussetlock(lock_array[ac] ) ; 
wc = work_array[ac] ; 
if(ac<ddl) { 
f inal.array [wc-1] = ac; 

}else if(ac<dd2) { 
f inal.eirray [wc + datal - 1] = ac; 

}else if(ac<dd3) { 

f in al_ array [wc + data2 + datal - 1] = ac; 

}else { 

final_ array [wc + data3 + data2 + datal - 1] = ac; 

} 

work_ array [ac] — ; 
usunsetlock(lock_array[ac]) ; 

} 

}else if (m_get_myid() == 3) { 

for(iw = T0TAL_KEYS-1 ; iw > div3-l; iw — ) { 
ad = key_array [iw] ; 
while (ustestlock(lock_eirray [ad] ) ) { 

} 

ussetlock(lock_array [ad] ) ; 
wd = work_ array [ad] ; 

if(ad<ddl) { 
f inal_array [wd-l] = ad; 

}else if(ad<dd2) { 
f inal_ array [wd + datal - 1] = ad; 

}else if(ad<dd3) { 

f inal.array [wd + data2 + datal - l] = ad; 

}else { 

f inal_array [wd + data3 + data2 + datal - 1] = ad; 

} 

work _ array [ad] — ; 
usunsetlock(lock_array [ad] ) ; 

} 

} 

return ; 

} 

/♦*♦***♦*♦**♦**♦*♦♦♦♦**♦♦** set _zero *♦*♦♦♦*♦*♦**♦**♦**♦*****♦****/ 
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/ ************************************ **************** / 

/* This function sets every element of the work_array to zero. 

This function is meant to be called by the m.fork function 
specific to SGI machines. 

*/ 

void set.zero (divl , div2, div3) 
int divl; 
int div2; 
int div3; 

int ix, iy, iz, iw; 

if (m_get_myid() == 0) { 

for(ix =0; ix < divl; ix++) 
work.array [ix] = 0; 

}else if (m_get_myid() == 1) { 
for(iy = divl; iy < div2; iy++) 
work_array [iy] = 0; 

}else if (m_get_myid() == 2) { 
for(iz = div2; iz < div3; iz++) 
work.array [iz] = 0; 

}else if (m_get_myid() == 3) { 

for(iw = div3; iw < MAX_KEY ; iw++) 
work.array [iw] = 0; 

} 

return ; 

} 

/:(c*4:*4:j(:*]>****:(c**;(c******i(ci(c*it: Verify ** ^|ii^i^i|i■^f.^|i^^********************* J 

int verifyO 

{ 

int IX, check; 

forCxx • 1 ; IX < TOTAL.KEYS; ix++) { 

if (final .array [ix] < f inal.airray [ix-1] ) { 
check • 1 ; 
breed; ; 

} 

else { check * 0;} 

} 
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return check; 



} 

1 3jc:fc3fc + :fc3tc + 3*c*3|c:|c4::*c:|c+:^3|c:*c:^3|c:*c3jc:|c3|c3*c3*c:|c3|c:|c3<c:t6:*c3|c + af:3|tatc:|c:|c:»c:|c + 3|c:*c3|c3|c*:f:3|c:|c3|c + :*c3|c:*c3|c*3|c3*c3|c;|e:f:*3lc3|c j 

I increment ***********************3lc**3lc* / 

I :|c:|c:|c*3|c3|c3|c:*c:|c*:|c:|c3|c + 3|c3|c:|c3|c:*c*:|c3|c + + :»c+:|e3tc*3|c:|c* + *3|c4::*c + :|c:*c:|c:|c:f:*3|c3|c3»c:|c3|c:|c3|c4:*:|c:|c3|c3|c*:*c:|c:lc:|c:ic:^c:|c j 

/* This function counts the occurences of a KEY by incrementing the 

value of work.array [KEY] . Thusly, work.array [4] will contain a count 
of the number of keys that are the number 4. This function is 
meant to be called using the SGI function m.fork. It is set up 
for parallel execution. 

*! 

void increment (divl , div2, div3) 
int divl; 
int div2; 
int div3; 

■c 

int ix, iy, iz, iw, aa, ab, ac, ad; 

/*ulock_t ba, bb, be, bd;*/ 

if (ni_get_myid() == 0) { 

for(ix =0; ix < divl; ix++){ 
aa = key.array [ix] ; 

while (ustest lock (lock_array [aa] ) ) -[ 

} 

ussetlock(lock_array[aa] ) ; 
work. array [aa] ++ ; 
usunsetlock(lock_array[aa]) ; 

} 

}else if (m.get.myidO == 1) { 

for(iy = divl; iy < div2; iy++){ 
ab = key.array [iy] ; 

while (ustestlock(lock.array[ab] ) ) { 

} 

ussetlock(lock.array [ab] ) ; 
work. array [ab] ++ ; 
usunsetlock(lock.array [ab] ) ; 

} 

}else if (m.get.myidO == 3) { 
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for(iz = div2; iz < div3; iz++){ 
ac = key_array [iz] ; 
while (ustestlock(lock_array[ac] )) { 

} 

ussetlock(lock_array [ac] ) ; 
work_array [ac ] ++ ; 
usunsetlock (lock. array [ac] ) ; 

} 

}else { 

for(iw = div3; iw < TOTAL.KEYS; iw++){ 
ad = k ey_ array [iw] ; 
while(ustestlock(lock_array [ad] ) ) { 

} 

ussetlock(lock_array [ad] ) ; 
work.array [ad] ++ ; 
usunsetlock(lock_array[ad]) ; 

} 

} 

return ; 

} 

/ 0 *** 0 *******^****************************************************/ 

/ 0 ** 0 * 0 * 0 *** 00 * 0 ********* 0 * tally ********************************/ 

/* This function tallys the number of work.array elements less than 
or equal to the work.array index. This function is called 
by the SGI function m.fork for 4 processors. 

•/ 

void tallyCdivl, div2, div3) 
int divl; 
int div2; 
int div3; 

{ 

int IX, iy, iz, iw; 

if (m_get_myid() == 0) { 

for(ix * 1; ix < divl; ix++) 

work_array[ix] += work.array [ix - 1] ; 

}else if (m_get_myid() == 1) { 
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for(iy = divl+1; iy 
work_array [iy] += 
}else if (m_get_myid() 
for(iz = div2+l; iz 
work.eirray [iz] += 
}else if (m_get_myid() 
for(iw = div3+l; iw 
work.array [iw] += 

} 



< div2; iy++) 
work.array [iy - l] ; 
== 2 ) { 

< div3; iz++) 
work_array [iz - 1] ; 
== 3) { 

< MAX.KEY; iw++) 
work_array [iw - 1] ; 



return; 

} 



:)c :tc 4: ******************** :|c ^ ************ :)c / 

/******************** main program *******************************/ 
/*****************************************************************/ 
mainO 
{ 



double XX, aa, zz; 

long long duration, end, stop, one, two, three, four, szo, inc, srt , 
FILE *true_sequential, *fork_sequential, ^forked; 

float data; 

int ix, iy, yy, dd, ddl , dd2, dd3, 

division, divl, div2, div3; 
char* lock_file = "lock.file"; 

unsigned int MAX = 400000; 



tal ; 



/* set up output files */ 

true_sequential = fopenCtsequential.dat", "w"); 
forked = fopen ( "forked . dat" , "w") ; 



/* Create a sequence of keys to sort */ 
create_seq( 314159265.00, 1220703125.00 ); 

/* calculate array boundaries for key.array final_array */ 

division = TOTAL.KEYS/4 ; 

divl = division; 

div2 = divl + division; 

div3 = div2 + division; 



/* calculate array boundeiries for work.eirray +/ 
dd = MAX.KEY/4; 
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ddl = dd; 
dd2 = ddl + dd; 
dd3 = dd2 + dd; 

/+ Set up lock configuration and handle information */ 
usconfig(CONF_INITSIZE, MAX); 
handle = usinit (lock_f ile) ; 

/* Initialize the lock.array (first time)*/ 
for(ix =0; ix < MAX_KEY; ix++) { 
lock.array [ix] = usnewlock(handle) ; 
usinitlock (lock.array [ix] ) ; 

} 

/* Initialize memory by running the sequential sort once */ 
m_f ork(set_zero , ddl, dd2, dd3) ; 
for(ix =0; ix < TOTAL.KEYS; ix++) { 
work.array [key.array [ix] ] ++ ; 

} 

for(ix =1; ix < MAX.KEY; ix++) { 

work.array [ix] += work.array [ix - 1] ; 

} 

counting.sort (0 , TOTAL.KEYS) ; 



/♦ Run the sort sequentially (single processor) 
as a baseline measurement for speedup. 

♦/ 

for(iy =0; iy < 1000; iy++) { 

end = gethrtimeO; /* stairt time */ 

/* initialize work.axray to zero */ 
for(ix =0; ix < MAX.KEY; ix++) { 
work.array [ix] = - 0 ; 

} 

/* count occurances of each key being sorted */ 
for(ix =0; ix < TOTAL.KEYS; ix++) { 
work.airray [key.ajrray [ix] ] ++ ; 
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} 



I* count the elements in work_array less than or equal to ix */ 
for(ix =1; ix < MAX.KEY; ix++) { 

work_ array [ix] += work_array [ix - 1]; 

} 

/* sort the keys into final_array +/ 
count ing.sort (0 , TOTAL.KEYS) ; 

stop = gethrtimeO; 

/* Verify proper sorting ♦/ 
if(verifyO) { 

printf("True Sequential Final-Array (run y.d) failed verification ! \n" , iy) ; 

} 

else { 

printfC'True Sequential Final-Array (run Xd) passed verification ! \n" , iy) ; 

} 

duration = stop - end; /* calculate duration */ 

data = (float)duration/1000000000; /* convert duration to seconds */ 

fprintf (true_sequential , "Optimum Sequential sort time is; ‘/.f \n" , data); 

} /* end for */ 

f close(true_sequential) ; 

/* set number of processors to 4 ♦/ 
m_set_procs(4) ; 

/* Initialize memory by running the forked sort once */ 
m_fork(set_zero, ddl, dd2, dd3) ; 
m.fork ( increment , divl, div2, div3) ; 
m_f ork(tally , ddl, dd2, dd3) ; 

m_fork(do_sort , divl, div2, div3, ddl, dd2, dd3) ; 

/* Perform the counting sort using forking 
and all 4 processors. This is what we 
"hope" provides speedup. 

*l 

for(iy *0; ly < 1000; iy++) { 

end ■ gethrtimeO; /* start time ♦/ 
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/* initialize work_array to zero */ 
m_f ork(set_zero , ddl, dd2, dd3) ; 
one = gethrtimeO; 

/* count occcurances of each key being sorted */ 
m_fork (increment , divl, div2, div3) ; 
two = gethrtimeO; 

/* count the elements in work_array less than or equal to ix */ 
m_f ork(tally , ddl, dd2, dd3) ; 
three = gethrtimeO; 

/* Record tally sums (at the upper interval limit) in globals */ 

datal = work_array [ddl - l] ; 

data2 = work_ array [dd2 - 1] ; 

data3 = work.array [dd3 - 1] ; 

four = gethrtimeO; 

/♦ sort the keys into final_array ♦/ 

m_f ork(do_sort , divl, div2, div3, ddl, dd2, dd3) ; 

stop = gethrtimeO; 

if(verifyO) { 

printf ("Fully Forked Final-Array (run */,d) failed verification ! \n" , iy) ; 
exit (1) ; 

> 

else { 

printf ("Fully Forked Final-Array (run */,d) passed verification! \n" , iy) ; 

} 



duration = stop - end; /* calculate duration */ 

szo = one - end; 

inc = two - one; 

tal = three - two; 

srt = stop - four; 

data = (f loat)duration/1000000000 ; /* convert duration to seconds*/ 

f printf (forked, "Forked sort time is; */lf\n", data); 
data = (float) szo/1000000000; 

f printf (forked, "Time spent in set.zero: \t */,f \n" , data); 
data = (float) inc/1000000000; 

f printf (forked, "Time spent in increment:\t */,f \n" , data); 
data = (f loat)tal/1000000000; 

f printf (forked, "Time spent in tally: \t */,f \n" , data); 

data = (f loat)srt/l000000000 ; 

f printf (forked, "Time spent in do.sort : \t ’/,f \n" , data); 
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} 



f close (forked) ; 
return 0; 
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APPENDIX E. SIMULATION EXPERIMENTAL 

DATA 



1. HETEROGENEITY QUADRANT DATA 

The tables included in this appendix are the “shorthand” matrices refered to 
in Chapter V. 





Machine 


Job 


1 


2 


3 


4 


5 


1 


mean 


30034 


11 


239 


30097 


533 


2 


mean 


25 


1003 


8619 


75 


65037 


3 


mean 


1078 


93 


1950 


204001 


8081 


4 


mean 


35096 


9501 


29 


2582 


1000 


5 


mean 


63 


45055 


1074075 


11533 


15 




Machine 


Job 


6 


7 


8 


9 


10 


1 


mean 


69 


42799 


1396 


52453 


4652 


2 


mean 


30093 


4723 


11372 


16333 


287 


3 


mean 


233 


9 


193 


566 


63526 


4 


mean 


75019 


23333 


782 


1134 


1705 


5 


mean 


403 


207 


6374 


304291 


666 



Table XII. High-Job, High-Machine Heterogeneity. 
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Machine 


Job 


1 


2 


3 


4 


5 


1 


mean 


25 


26 


27 


28 


29 


2 


mean 


175 


166 


174 


167 


173 


3 


mean 


3095 


3094 


3009 


3096 


3093 


4 


mean 


9900 


9899 


9898 


9897 


9896 


5 


mean 


30007 


30006 


30005 


30004 


30003 




Machine 


Job 


6 


7 


8 


9 


10 


1 


mean 


30 


31 


32 


33 


34 


2 


mean 


168 


172 


169 


171 


170 


3 


mean 


3097 


3092 


3098 


3091 


3099 


4 


mean 


9901 


9902 


9903 


9904 


9905 


5 


mean 


30002 


30001 


30000 


30008 


30009 



Table XIII. High-Job, Low-Machine Heterogeneity. 





Machine 


Job 


1 


2 


3 


4 


5 


1 


mean 


5 


1003 


101 


29 


2002 


2 


mean 


6 


1001 


104 


25 


2001 


3 


mean 


9 


1002 


102 


27 


2000 


4 


mean 


8 


1000 


103 


28 


2004 


5 


mean 


7 


1004 


100 


26 


2003 




Machine 


Job 


6 


1-7 

( 


8 


9 


10 


1 


mean 


69 


5500 


300 


9996 


25 


2 


mean 


65 


5499 


299 


10000 


22 


3 


mean 


67 


5497 


298 


9998 


23 


4 


mean 


66 


5498 


297 


9999 


21 


5 


mean 


68 


5496 


296 


9997 


24 



Table XIV. Low-Job, High-Machine Heterogeneity. 
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Machine 


Job 


1 


2 


3 


4 


5 


1 


mean 


23 


22 


21 


20 


24 


2 


mean 


24 


23 


22 


21 


25 


3 


mean 


25 


24 


23 


22 


26 


4 


mean 


26 


25 


24 


23 


27 


5 


mean 


27 


26 


25 


24 


28 




Machine 


Job 


6 


7 


8 


9 


10 


1 


mean 


25 


27 


28 


31 


30 


2 


mean 


26 


25 


26 


28 


20 


3 


mean 


27 


23 


24 


25 


28 


4 


mean 


28 


21 


22 


22 


22 


5 


mean 


29 


19 


20 


19 


25 



Table XV. Low-Job, Low-Machine Heterogeneity. 





Machine 


Job 


1 


2 


3 


4 


5 


1 


mean 


300034 


52453 


42799 


30097 


4652 


2 


mean 


65037 


30093 


16333 


11372 


8619 


3 


mean 


204001 


63526 


8081 


1950 


1078 


4 


mean 


75019 


35096 


23333 


9501 


2582 


5 


mean 


1074075 


304291 


11533 


6374 


666 




Machine 


Job 


6 


7 


8 


9 


10 


1 


mean 


1396 


533 


239 


69 


11 


2 


mean 


4723 


1003 


287 


75 


25 


3 


mean 


566 


233 


193 


93 


9 


4 


mean 


1705 


1134 


1000 


782 


29 


5 


mean 


6374 


403 


207 


63 


15 



Table XVI. High-Job, High-Machine, Consistent Heterogeneity. 
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Machine 


Job 


1 


2 


3 


4 


5 


1 


mean 


9996 


5500 


2002 


1003 


300 


2 


mean 


10000 


5499 


2001 


1001 


299 


3 


mean 


9998 


5497 


2000 


1002 


298 


4 


mean 


9999 


5498 


2004 


1000 


297 


5 


mean 


9997 


5496 


2003 


1004 


296 




Machine 


Job 


6 


7 


8 


9 


10 


1 


mean 


101 


69 


29 


25 


5 


2 


mean 


104 


65 


25 


22 


6 


3 


mean 


102 


67 


27 


23 


9 


4 


mean 


103 


66 


28 


21 


8 


5 


mean 


100 


68 


26 


24 


7 



Table XVII. Low-Job, High-Machine, Consistent Heterogeneity. 
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APPENDIX F. SIMULATION EXPERIMENT 

RESULTS 

1 . ZERO- VARIANCE SIMULATION EXPERIMENT RES- 
ULTS 



125-1 


Hi-Hi 


Hi-Lo 


Lo-Hi 


Lo-Lo 


Hi-Hi-Con 


Lo-Hi- Con 


OLB 


1,074,104 


119,576 


10,000 


314 


204,001 


9,998 


LBA 


783 


690,000 


883 


1,094 


2,221 


883 


Greedy 


782 


105,620 


483 


289 


1,666 


483 


Fastgreedy 


783 


119,454 


507 


293 


1,664 


505 


125-2 


Hi-Hi 


Hi-Lo 


Lo-Hi 


Lo-Lo 


Hi-Hi-Con 


Lo-Hi-Con 


OLB 


1,074,074 


122,491 


9,996 


319 


75,019 


10,000 


LBA 


754 


750,000 


869 


964 


2,291 


869 


Greedy 


754 


109,500 


472 


288 


1,681 


472 


Fastgreedy 


754 


122,396 


491 


299 


1,669 


494 


500-3 


Hi-Hi 


Hi-Lo 


Lo-Hi 


Lo-Lo 


Hi-Hi-Con 


Lo-Hi-Con 


OLB 


1,074,075 


458,809 


9,996 


1,230 


204,001 


9,997 


LBA 


2,842 


3,090,000 


3,497 


4,240 


8,834 


3,497 


Greedy 


2,726 


439,018 


1,874 


1,119 


6,514 


1,874 


Fastgreedy 


2,697 


458,910 


1,931 


1,158 


6,498 


1,931 


500-4 


Hi-Hi 


Hi-Lo 


Lo-Hi 


Lo-Lo 


Hi-Hi-Con 


LoHi-Con 


OLB 


1,082,723 


430,923 


9,998 


1,235 


304,291 


9,996 


LBA 


2,813 


2,880,000 


3,485 


4,300 


8,858 


3,485 


Greedy 


2,697 


416,990 


1,865 


1,116 


6,527 


1,865 


Fastgreedy 


2,639 


430,781 


1,930 


1,155 


6,506 


1,924 



lahlc XVIIl. Baseline Simulation Experiment Results. Heterogeneity should be read 
.Machine. Also, “Con” refers to consistency; absence of “Con" means the hetero- 
geneity is inconsistent. 
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2. RESULTS OF SIMULATION EXPERIMENTS WHERE 
JOBS RAN FOR TIMES DIFFERENT FROM PRE- 
DICTED TIMES. 

a. Exponential Run-time Distribution Experiment Res- 
ults 



125-1 


Lo-Hi 


Hi-Hi-Con 


Lo-Hi-Con 


OLB 


9,999.53 


203,999.80 


9,994.13 


LBA 


873.27 


2,189.73 


851.00 


Greedy 


497.93 


1,657.53 


489.33 


Fastgreedy 


523.00 


1,643.87 


538.93 


125-2 


Lo-Hi 


Hi-Hi-Con 


Lo-Hi-Con 


OLB 


9,994.80 


75,020.47 


9,995.93 


LBA 


854.87 


2,289.00 


846.40 


Greedy 


481.80 


1,662.13 


492.33 


Fastgreedy 


705.13 


1,642.20 


502.07 


500-3 


Lo-Hi 


Hi-Hi-Con 


Lo-Hi-Con 


OLB 


9,995.13 


204,001.27 


9,995.93 


LBA 


3,470.33 


8,805.07 


3,482.00 


Greedy 


1,905.13 


6,504.20 


1,899.53 


Fastgreedy 


1,956.53 


6,507.93 


1,967.33 


500-4 


Lo-Hi 


Hi-Hi-Con 


Lo-Hi-Con 


OLB 


9,998.20 


304,291.67 


9,995.60 


LBA 


3,458.20 


8,854.07 


3,459.07 


Greedy 


1,905.60 


6,565.33 


1,927.13 


Fcistgreedy 


1,964.80 


6,521.60 


1,956.33 



Table XIX. Exponential Experiment Results for the Low-Job, High-Machine, High- 
Job, High-Machine, Consistent, and Low- Job, High-Machine, Consistent categories of 
heterogeneity. 
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b. Truncated Gaussian Run-time Distribution Experi- 
ment Results 



125-1 


Lo-Hi 


Hi-Hi-Con 


Lo-Hi-Con 


OLE 


10,032.93 


300,807.00 


10,066.73 


LEA 


1,054.13 


2,477.47 


1,055.40 


Greedy 


594.93 


1,871.87 


572.87 


Fastgreedy 


603.80 


1,839.00 


594.93 


125-2 


Lo-Hi 


Hi-Hi-Con 


Lo-Hi-Con 


OLE 


10,029.27 


300,053.20 


10,045.60 


LEA 


1,032.07 


2,530.53 


1,037.87 


Greedy 


574.20 


1,879.87 


564.27 


Fastgreedy 


593.13 


1,885.40 


603.53 


500-3 


Lo-Hi 


Hi-Hi-Con 


Lo-Hi-Con 


OLE 


10,092.27 


304,570.27 


10,045.40 


LEA 


4,251.40 


9,983.20 


4,247.00 


Greedy 


2,298.40 


7,288.93 


2,295.00 


Fastgreedy 


2,343.80 


7,269.93 


2,341.33 


500-4 


Lo-Hi 


Hi-Hi-Con 


Lo-Hi-Con 


OLE 


10,056.73 


1,074,996.07 


10,032.47 


LEA 


4,250.27 


9,988.60 


4,209.93 


Greedy 


2,285.47 


7,342.80 


2,275.73 


Fastgreedy 


2,357.47 


7,304.87 


2,336.07 



Table XX. Truncated Gaussian Experiment Results for the Low-Job, High-Machine, 
High-Job, High-Machine, Consistent, and Low-Job, High-Machine, Consistent cat- 
egories of heterogeneity. 
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3. ADDITIONAL EXPERIMENTS 

a. Comparison of Baseline Run-time and Theoretical 
Best Case Run-time 



125-1 


Hi-Hi 


Hi-Lo 


Lo-Hi 


Lo-Lo 


Hi-Hi-Con 


Lo-Hi-Con 


OLB 


483512% 


14% 


11225% 


22% 


91750% 


11222% 


LBA 


252% 


561% 


900% 


327% 


900% 


900% 


Greedy 


252% 


1% 


7 447% 


12% 


650% 


447% 


Fastgreedy 


252% 


14% 


474% 


14% 


649% 


471% 


125-2 


Hi-Hi 


Hi-Lo 


Lo-Hi 


Lo-Lo 


Hi-Hi-Con 


Lo-Hi-Con 


OLB 


468723% 


13% 


11402% 


25% 


32645% 


11407% 


LBA 


229% 


595% 


900% 


278% 


900% 


900% 


Greedy 


229% 


1% 


443% 


13% 


633% 


443% 


Fastgreedy 


229% 


13% 


465% 


17% 


628% 


468% 


500-3 


Hi-Hi 


Hi-Lo 


Lo-Hi 


Lo-Lo 


Hi-Hi-Con 


Lo-Hi“Con 


OLB 


121484% 


4% 


2758% 


20% 


22992% 


2758% 


LBA 


221% 


605% 


900% 


315.89% 


900% 


900% 


Greedy 


208% 


0.24% 


435% 


9% 


637% 


435% 


Fastgreedy 


205% 


4% 


452% 


13% 


635% 


452% 


500-4 


Hi-Hi 


Hi-Lo 


Lo-Hi 


Lo-Lo 


Hi-Hi-Con 


Lo-Hi-Con 


OLB 


122131% 


3% 


2768% 


21% 


34252% 


2768% 


LBA 


217% 


592% 


900% 


321% 


900% 


900% 


Greedy 


204% 


0.23% 


435% 


9% 


636% 


435% 


Fastgreedy 


197% 


3% 


453% 


13% 


634% 


452% 



Table XXI. Theoretical Best versus Baseline Completion Time.. This data depicts 
the percentage difference between the theoretical Best Case Time and the baseline 
completion time. In every case, SmartNet builds a schedule which takes longer to 
execute than the theoretical Best Case Time. 
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b. Greedy versus Fast Greedy Performance 



Test 


Hi-Hi 


Hi-Lo 


Lo-Hi 


Lo-Lo 


Hi-Hi-Con 


Lo-Hi-Con 


Baseline 


-0.77% 


8.18% 


3.88% 


3.05% 


-0.35% 


3.86% 


Exponential 






14.30% 




-0.66% 


4.30% 


T-Gaussian 






2.48% 




-0.56% 


3.87% 



Table XXII. Greedy versus Fast Greedy, Sequential Method 145 . This table shows 
how much faster schedules built by the Greedy algorithm finish executing versus sched- 
ules built by the Fast Greedy algorithm using the Sequential Method of job request. 
Positive values mean that the Greedy schedule is executed xx% fcister than the Fast 
Greedy schedule. 



c. Grouped versus Sequential Job Request Methods 





Hi-Hi 


Hi-Lo 


Lo-Hi 


Lo-Lo 


Hi-Hi-Con 


Lo-Hi-Con 


Grouped Method 


19.95% 


2.44% 


5.37% 


8.12% 


4.58% 


5.37% 



Table XXIII. Greedy versus Fast Greedy, Grouped Method. This table shows how 
much faster schedules built by the Greedy algorithm finish executing versus schedules 
built by the Fast Greedy algorithm. Positive values mean that the Greedy schedule 
is executed xx% faster than the Fast Greedy schedule. 
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APPENDIX G. HOW TO RUN SMARTNET 



1. GETTING STARTED 

a. Unpacking the Code 

It is suggested by the SmartNet development team that the code be unpacked 
into a directory called SOLARIS. We follow that advice throughout this appendix. 
The name SOLARIS is used because we used the Solaris operating system version of 
SmartNet, and hence compiled the code on a Solaris machine. Take the sn.tar.gz file, 
move it into the SOLARIS directory, and unzip it. Next, execute the command 

tair xvf sn.tcur 

and the source code will expand. 

b. Setting the Environment 

In order to compile and run SmartNet, your environment must be set properly. 
Below is all that I needed to do to set my environment for use at NFS (my login name 
was rkarmstr; substitute your path and login name as appropriate). 

# setup for SmartNet setenv SNROOT 

# /users/workS/rkairmstr/SOLARIS set path=($path 

# /users/work3/rkarmstr/S0LARIS/local/bin) set path=($path 

# /opt/cygnus/bin) set path=($path /usr/xpg4/bin) setenv 

# LD.LIBRARY.PATH /usr/include\ :$LD_LIBRARY_PATH 

c. Compiling SmartNet 

While this used to be a terribly difficult procedure at NFS, we fixed the dif- 
ficulties, so now the process seems to work fine. Compiling must be performed on a 
machine running the Solaris operating system. There are two such machines available 
at NFS, cincinnatus and virgo. Both machines are running SunOS 5.5*, and both 
machines are SFARCstation-20s. In order to compile SmartNet, perform the following 
tcisks, in order. (This assumes you have already installed the code.) 

*SunOS 5.5 is also called Solaris 2.5. 
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1. telnet virgo or telnet cincinnatus. 

2. cd /SOLARIS 

3. src/sn/conf igure — enable-use\_gnumake — enable-use\_gcc 

4. malce depend 

5. make 



Other command line arguments to configure are listed below. 



Usage: configure [options] [host] 

Options: [defaults in brackets after descriptions] 



Configuration : 

— cache-f ile=FILE 
— help 
— no-create 
— quiet, — silent 
— version 



cache test results in FILE 
print this message 
do not create output files 
do not print 'checking...' messages 
print the version of autoconf that 
created configure 



Directory and file names: 

— pref ix=PREFIX install architecture-independent 

files in PREFIX [/usr/local] 

— exec-pref ix=PREFIX install airchitecture-dependent 

files in PREFIX [same as prefix] 
— srcdir=DIR find the sources in DIR 

[configure dir or . .] 

— program-pref ix=PREFIX prepend PREFIX to installed 

program names 

— program-suff ix=SUFFIX append SUFFIX to installed 

program names 

— program-tramsf orm-name=PROGRAM run sed PROGRAM on 

installed program names 



Host type: 

— build=BUILD 

— host=H0ST 
— target=TARGET 



configure for building on 
BUILD [BUILD=HOST] 
configure for HOST [guessed] 
configure for TARGET [TARGET=HOST] 
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Features zind packages: 

— disable- FEATURE 

— enable-FEATURE [=ARG] 
— with-PACKAGE [=ARG] 

— without-PACKAGE 

— x-includes=DIR 
— x-libraxies=DIR 



do not include FEATURE 
(saune as — enable-FEATURE=no) 
include FEATURE [ARG=yes] 
use PACKAGE [ARG=yes] 
do not use PACKAGE 
(scune as — with-PACKAGE=no) 

X include files aire in DIR 
X library files are in DIR 



— enable cind — with options recognized: 

— enable-use.gnumake use the gnumake utility, 



very nifty indeed 
use the gcc compiler instead of 
native compiler 
make this thing DEBUG 'ed 
make this thing OPTIMIZE 'ed 
meike a releable version. 

— enable-use_static_link make static linked binaries. 

— enable-use_purecov make static linked binaries. 

— with-x use the X Window System 



— enable-use_gcc 

— enable-use_DEBUG 
— enable-use_OPTIMIZE 
— enable-use_RELEASE 



After several minutes, you will have compiled all the SmartNet binaries. 

2. USING THE SMARTNET SIMULATOR 

This section assumes that the user has access to the SmartNet Users Guide [Ref. 
10]. The Users Guide includes extensive instructions for and examples of commands 
for running SmartNet. The Users Guide does not include any information about 
running SmartNet in simulation mode, however. This section explains how to run 
SmartNet in simulation mode. 

a. Files 

In order to run SmartNet in simulation mode, there is specific information that 
needs to be provided in certain files that will make SmartNet perform correctly. 



155 



i. .smartnetrc 

This file is required by SmartNet regardless of whether it is being run 
in simulation mode or not. The file may need to be altered, depending upon what we 
are trying to measure with the simulator. Here is a sample . smartnetrc file. 



dbInFilename: 
dbOutF ilename : 
scheduler: 
rescheduleMode : 
debug : 
debugFile : 
verbosity: 



/users/ work3/ rkarmstr/SOLARIS/local/test s/hihi .0.0. dat 

/dev/null 

OLB 

Off 

none 

/dev/null 

V q 



In the above . smaxtnetrc file, we would need to change the name of the input database 
file dbInFilename dependent upon the test we were running. Also, the scheduling 
algorithm used would need to be changed. Lastly, we may need to consider enabling 
the reschedule capability rescheduleMode in order to allow rescheduling to occur. 
The other lines can be altered as desired; explanation of all fields in the . smartnetrc 
file can be found in the Users Guide. 

ii. Command File 

The command file lists jobs to be schedules and subsequently run by 
SmartNet. In simulation mode, SmartNet needs the command file data in order to 
know what jobs are to be scheduled and their execution simulated. An example of two 
types of command files is available in this appendix in Section 4. The command file 
can be anywhere in our directory structure; we will specify it by name and location 
when needed. 

b. Commands 

In order to run SmartNet in simulation mode, several executables must be 
started in a particular order. First, the SmartNet- master must be started in simulation 
mode. This starts the SmartNet server in simulation mode as well as the SmartNet- 
queue. It also reads the SmartNet database for use by the scheduler. An example 
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database is located in Section 5 of this appendix. These programs basically start 
SmartNet. Next, we need to start the SmartNet logger, which enables logging of 
all job execution and scheduling messaging. After the SmartNet logger, be start the 
SmartNet submit program in simulation mode, which submits jobs, via the command 
file, to SmartNet so that these jobs can be scheduled. 

After these commands have been entered, SmartNet will build a schedule, 
simulate the execution of the schedule, and stop. SmartNet master, queue, and server 
will still be running until killed. SmartNet submit also remains running, and must 
be killed by process number. SmartNet master and the rest can be killed with the 
command sn-control — OFF. Note that the SmartNet logger will halt itself after 
the schedule has executed. Section 6 of this appendix has a sample script used to 
run through a single iteration of the process described above. Section 6 includes the 
command line arguments needed to start all the executables discussed here. 

c. Scripts 

In order to make multiple runs of the SmartNet in simulation mode, we found 
it most helpful to use scripts. In the previous section, we discussed one of the many 
scripts used to help run SmartNet in simulation mode over and over again without 
the need for human intervention at the beginning and end of each test of SmartNet. 
Scripts were used throughout this research to simplify all the work performed. 

Section 6 also includes a script used to run a set of experiments using mul- 
tiple command files and multiple databases. It basically walks through the directory 
structure .set up to house the experiments and performs sequences of tests. Instead of 
waiting at tin- t<'rminal to type the commands, they have been scripted. 

S<vtion f> alsr) includes the Perl scripts written to parse data from the log files 
that the SmartNet logger writes. These log files include scheduling information and 
runtime information. We parsed this data using the file parselog.pl. This Perl 
script extra* t> the important information from the log files and puts it into another 
file, si)erifi«'<l iti the script. This parsed information is then parsed and averaged 
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again with the Perl script collect.pl, also found in Section 6. This script reduces 
the parsed data to a manageable form. The output is less than a page, and represents 
the run-time duration information of 60 separate executions of SmartNet. 



3. RUNNING SMARTNET IN SIMULATION MODE 

Previous to this section, we discussed the necessary components of getting 
SmartNet ready to run, scripts used, and files/ commands needed. Here, we put it all 
together in a step-by-step format in an attempt to make the process easier to follow. 

1. Unpack SmartNet source code. 

2. Compile SmartNet source code into SmartNet binaries. 

3. Determine the experiments you need to perform. 

• Establish the directory structure you need for your output to be easily 
identified as being produced by a certain database or command file. You 
will need a . smartnetrc file in every directory from which you will run 
SmartNet. 

• Build your command file(s). 

• Build you database(s). 

• Ensure your .smartnetrc file(s) are calling the correct database file and 
scheduling algorithm. 

• Edit the parselog.pl and collect.pl files, as necesary. Each directory 
that you are running SmartNet from should contain a copy of both of these 
files. They should be able to be executed. 

4. Build your scripts specific to the command files you intend to test. You will 
want one of each type in each directory from which you are running SmartNet. 

5. Build your scripts specific to running different sets of SmartNet scripts listed 
previously. This is the big, “start it off” script. 

6. Run the “start it off” script and collect your output. 

Figure 43 shows how we set up our directory structure, to include naming 
conventions and files included. 
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~rkarmstr/test 




database file 
used 



tO.O 




used 



tSOO.O 



lantz 







1 










- 1 






.smartnetre 




.smartnetre 






^ 125-l.sh 




125-up.sh 






125-2.sh 




125-dn.sh 


' 




^ 500-3.sh 




500-up.sh 






^ 500-4.sh 




500-dn.sh 






parselog.pl 




parselog.pl 






collect.pl 




collect.pl 



These were the directories 
from which SmartNet was 
run. Output files were written 
to each of these directories 
specific to the algorithm, 
categories of heterogeneity, 
and command file used. 



Figure 43. Directory Structure Used For Experiments. This wcls the directory struc- 
ture we used throughout the conduct of this research. 

4. EXAMPLE COMMAND FILES 

This section contains sample command files used in the conduct of this research. 

a. Command File — The Random Method 

This sample command file is used to tell SmartNet the nanu's of the jobs it 
n«*eds to schedule. The jobs are read into SmartNet one at a time and with uniform 
randomness — hence, the name The Random Method. 

model = jobl 
commandline = jobl 
cchaors = 100 
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stdout = /dev/null 
submit = 1 

model = job4 
commandline = jobl 
cchars = 100 
stdout = /dev/null 
submit = 1 

model = job4 
commandline = jobl 
cchars = 100 
stdout = /dev/null 
submit = 1 

model = jobs 
commandline = jobl 
cchars = 100 
stdout = /dev/null 
submit = 1 

model = job2 
commandline = jobl 
cchars = 100 
stdout = /dev/null 
submit = 1 

model = job2 
commandline = jobl 
cchars = 100 
stdout = /dev/null 
submit = 1 

model = job4 
commandline = jobl 
cchars = 100 
stdout = /dev/null 
submit = 1 

model = jobs 
commandline = jobl 
cchars = 100 



stdout = /dev/null 
submit = 1 



b. Command File — The Grouped Method 

This sample command file also tells SmartNet which jobs it needs to schedule. 
It does so by grouping jobs. Note that j obi is requested to run 25 times — hence, the 
grouped method. 

model = jobl 
commandline = jobl 
cchars = 100 
stdout = /dev/null 
submit = 25 

model = job2 
commandline = jobl 
cchars = 100 
stdout = /dev/null 
submit = 25 

model = jobs 
commandline = jobl 
cchars = 100 
stdout = /dev/null 
submit = 25 

model = job4 
commandline = jobl 
cchars = 100 
stdout = /dev/null 
submit = 25 

model = job5 
commandline = jobl 
cchars = 100 
stdout = /dev/null 
submit = 25 
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5. EXAMPLE DATABASE FILE 
// 

// Armstrong scimple database file 
// for testing the SmairtNetsimulator 
// 

// 

// The number of Site objects 

// 

0 

// 

// The number of Machine objects 

// 

4 

// The IP address is repeated for all machines because 
// SmartNet tries to connect to the machine even in 
// simulation mode, even though it will not run anything 
// on the machine. I gave it the IP address of hetero. 
// 

// Also, the neunes of te machines and jobs is notional 
// See the SmartNet Users Guide for a more realistic 
// database example. 



machinel 

Sun 

131.120.2.1 
Sun/Spairc 900 
Notional 
1 
1 

NULL 



// Machine name 
// Architecture 
//IP Address 



// Relative cost 

// Is the machine notional? 

// Site Name 



machine2 

Sun 

131.120.2.1 
Sun/Sparc 900 
Notional 
1 
1 

NULL 



// Machine neime 
// Architecture 
// IP Address 



// Relative cost 

// Is the machine notional? 

// Site Ncuue 
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machines 

Sun 

131.120.2.1 
Sun/Sparc 900 
Notional 
1 
1 

NULL 



// Machine name 
// Architecture 
//IP Address 

// Relative cost 

// Is the machine notional? 

// Site Name 



machine4 

Sun 

131.120.2.1 
Sun/Sparc 900 
Notional 
1 
1 

NULL 



// Machine name 
// Architecture 
// IP Address 

// Relative cost 

// Is the machine notional? 

// Site Name 



// 

// The number of Model objects 

// 

3 



jobl 

Bob's Test 
1 
1 

time 



// Model name 
Application! 

// idempotent [0|l] 

// The number of description lines 



job2 

Bob's Test 
1 
1 

time 



// Model name 
Application2 

// idempotent [0 1 1] 

// The number of description lines 



jobs 

Bob's Test 
1 
1 



// Model name 
Applications 

// idempotent 
// The number 



time 



coll] 

of description lines 
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// 

// The number of ModelMachine objects 

// 

12 

machinel // Machine naime 

jobl // Model neime 
NULL // Group Neime 

normal // distribution type 

300034.00 // moment-1 CHANGE FOR EACH 

900102.0 // moment-2 CHANGE FOR EACH 
0.0 // moment-3 

$0 * 3000.34 // Theoretical compute function 

$0*0 // Theoretical Network function 

1 // Theoretical data use function 

NULL // Theoretical floating-point function 

// Compute Data: 

0 // The Eimount of Experiential data 

0 // The Eunount of normalized Experiential data 

// Network Data: 

0 // The amount of Experiential data 



machinel // Machine name 

job2 // Model name 
NULL // Group Name 

normal // distribution type 

25.0 // moment- 1 CHANGE FOR EACH 

75.0 // moment-2 CHANGE FOR EACH 
0.0 // moment-3 



$0 • 
$0 • 

1 

NULL 

0 

0 



0.25 

0 



// Theoretical compute function 
// Theoretical Network function 
// Theoretical data use function 
// Theoretical floating-point function 
// Compute Data: 

// The amount of Experiential data 

// The amount of normalized Experiential data 

// Network Data: 
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0 



// The amount of Experiential data 



machinel // Machine name 

jobs // Model name 
NULL // Group Name 

normal // distribution type 

1078.0 // moment-1 CHANGE FOR EACH 

3234.0 // moment-2 CHANGE FOR EACH 
0.0 // moment-3 

$0 * 10.78 // Theoretical compute function 

$0*0 // Theoretical Network function 

1 // Theoretical data use function 

NULL // Theoretical floating-point function 

// Compute Data: 

0 // The amount of Experiential data 

0 // The amount of normalized Experiential data 

// Network Data: 

0 // The amount of Experiential data 



machine2 // Machine name 

jobl // Model neune 
NULL // Group Name 

normal // distribution type 

11.0 // moment-1 CHANGE FOR EACH 

33.0 // moment-2 CHANGE FOR EACH 
0.0 // moment -3 



$0 * 
$0 * 
1 

NULL 

0 

0 



0.11 

0 



// Theoretical compute function 
// Theoretical Network function 
// Theoretical data use function 
// Theoretical floating-point function 
// Compute Data: 

// The amount of Experiential data 

// The amount of normalized Experiential data 

// Network Data: 

// The amount of Experiential data 



machine2 // Machine name 

job2 // Model name 
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NULL // Group Name 

normal // distribution type 

1003.0 // moment- 1 CHANGE FOR EACH 

3009.0 // moment-2 CHANGE FOR EACH 
0.0 // moment -3 

$0 * 10.03 // Theoretical compute function 

$0*0 // Theoretical Network function 

1 // Theoretical data use function 

NULL // Theoretical floating-point function 

// Compute Data: 

0 // The amount of Experiential data 

0 // The amount of normalized Experiential data 

// Network Data: 

0 // The amount of Experiential data 



machine2 // Machine neune 

job3 // Model name 
NULL // Group Name 

normal // distribution type 

93.0 // moment -1 CHANGE FOR EACH 

279.0 // moment-2 CHANGE FOR EACH 
0.0 // moment-3 



$0 ♦ 
$0 ♦ 
1 

NULL 

0 

0 



0.93 

0 



// Theoretical compute function 
// Theoretical Network function 
// Theoretical data use function 
// Theoretical floating-point function 
// Compute Data: 

// The amount of Experiential data 

// The amount of normalized Experiential data 

// Network Data: 

// The amount of Experiential data 



machine3 // Machine name 

jobl // Model name 
NULL // Group Name 

normal // distribution type 

239.0 // moment-1 CHANGE FOR EACH 

717.0 // moment-2 CHANGE FOR EACH 
0.0 // moment-3 

$0 * 2.39 // Theoretical compute function 
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$0*0 


// 


1 


// 


NULL 


// 




// 


0 


// 


0 


// 




// 


0 


// 


machines 


// 


job2 // Model 


name 


NULL 


// 



Machine name 



normal // distribution type 

8619.0 // moment- 1 CHANGE FOR EACH 

25857.0 // moment-2 CHANGE FOR EACH 
0.0 // moment-3 

$0 * 

$0 * 

1 

NULL 

0 
0 



86.19 // Theoretical compute function 

0 // Theoretical Network function 

// Theoretical data use function 
// Theoretical floating-point function 
// Compute Data: 

// The amount of Experiential data 

// The amount of normalized Experiential data 

// Network Data: 

// The amount of Experiential data 



machines // Machine name 

jobs // Model name 
NULL / / Group Name 

normal // distribution type 

1950.0 // moment-1 CHANGE FOR EACH 

5850.0 // moment-2 CHANGE FOR EACH 
0.0 // moment-3 



$0 
$0 
1 

NULL 

0 

0 



♦ 19.5 

♦ 0 



// Theoretical compute function 
// Theoretical Network function 
// Theoretical data use function 
// Theoretical floating-point function 
// Compute Data: 

// The amount of Experiential data 

// The amount of normalized Experiential data 
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0 



// Network Data: 

// The amount of Experiential data 



machine4 // Machine name 

jobl // Model name 
NULL // Group Name 

normal // distribution type 

30097.0 // moment-1 CHANGE FOR EACH 

90291.0 // moment-2 CHANGE FOR EACH 
0.0 // moment-3 



$0 ♦ 300.97 
$ 0*0 
1 

NULL 

0 

0 



// Theoretical compute function 
// Theoretical Network function 
// Theoretical data use function 
// Theoretical floating-point function 
// Compute Data; 

// The amount of Experiential data 

// The amount of normalized Experiential data 

// Network Data; 

// The amount of Experiential data 



machine4 // Machine name 

job2 // Model name 
NULL // Group Name 

normal // distribution type 

75.0 // moment- 1 CHANGE FOR EACH 

225.0 // moment-2 CHANGE FOR EACH 
0.0 // moment-3 



$0 ♦ 
$0 ♦ 
1 

NULL 

0 

0 



0.75 

0 



// Theoretical compute function 
// Theoretical Network function 
// Theoretical data use function 
// Theoretical floating-point function 
// Compute Data: 

// The amount of Experiential data 

// The amount of normalized Experiential data 

// Network Data: 

// The amount of Experiential data 



machine4 // Machine name 

job3 // Model name 
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NULL // Group Name 

normal // distribution type 



204001.0 // moment-1 CHANGE FOR EACH 

612003.0 // moment-2 CHANGE FOR EACH 
0.0 // moment-3 

$0 * 2040.01 // Theoretical compute function 



$0 
1 

NULL 

0 

0 



* 0 



// Theoretical Network function 
// Theoretical data use function 
// Theoretical floating-point function 
// Compute Data; 

// The amount of Experiential data 

// The amount of normalized Experiential data 

// Network Data: 

// The amount of Experiential data 



// 

// The SNData default Override object: 

// 

NULL //Model name 
NULL //Machine name 
ExecutionEquation NULL 
DataUseEquation NULL 
NetworkEquation NULL 
ComputeWeight 1 
NetworkWeight 1 

TheoreticalExecutionWeight 0.5 
ExperientialExecutionWeight 0.5 
OverrideExecutionWeight 0.5 
TheoreticalNetworkWeight 0.5 
ExperientialNetworkWeight 0.5 
OverrideNetworkWeight 0.5 
End_0verride 

// 

// inter-site network information (bandwidth & latency) 

// 

End.NetMatrix 
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6. EXAMPLE SCRIPTS 

a. Script for Starting and Running SmartNet: 125-l.sh 

This is a script which makes it very easy to start and run SmartNet in simu- 
lation mode. The script will start SmartNet, execute a schedule in simulation mode, 
and then stop SmartNet. If you need to do this repetitively, the script should include 
multiple sequences of the below commands. We built scripts like the one below for 
each separate command file. They were located in the directory from which we ran 
SmartNet for that particular experiment. 

# ! /bin/ksh 

# Start the master/server/queue 

# -S is for simulation mode 

# -s denotes the scheduling algorithm we desire to use. 

# This can also be spcified in the .smartnetrc file. 

# -f denotes the name of the database file to be loaded 

# into SmartNet 

smaurtnet -master -S -s OLB -f /users/work3/rkarmstr/tests/hihi . 0 . 0 . dat & 

# This allows things to start up correctly 
sleep 10 

# Steirt the logger 

# -n tells the logger how many jobs will be scheduled so that it 

# knows when to die 

sn-log -n 125 -o testl25-l-l.log & 

# This allows things to start up correctly 
sleep 3 

# Start SmartNet submit 

# -S is for simulation mode 

# the required argument is the name of the commaind file 

# listing the jobs requests 

sn-submit -S /users/work3/rkarmstr/tests/testl25-l .cmd & 

# Wait for the SmartNet logger to die 
wait */2 
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# Kill SmartNet submit 
kill -QUIT */.3 

# Wait for smartNet submit to die 
sleep 10 

# Kill the SmartNet master/server/queue 
sn-control — OFF 
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b. Script for Running Experiments: ttO.O.sh 

#! /bin/ksh 

# This is a script to run all 0.0 variance tests 

# for hihi Ihilo I lohi I lolo llinear heterogeneous sets 

# on olb I Iba Igreedy If astgreedy algorithms. 

# olb tests 

mail rkeirmstr < /users/work3/rkaxmstr/S0LARIS/local/tests/mmolb 

cd /users/work3/rkarmstr/SOLARIS/local/tests/olb/hihi/tO. 0 

125-1. sh 

sleep 10 

125-2. sh 

sleep 10 

500-3. sh 

sleep 10 

500-4. sh 

sleep 30 

paxselog.pl 

collect.pl 

cd /users/work3/rkarmstr/SOLARIS/local/tests/olb/hilo/tO. 0 

125-l.sh 

sleep 10 

125-2. sh 

sleep 10 

500-3. sh 

sleep 10 

500-4. sh 

sleep 30 

parselog.pl 

collect .pi 

cd /users/work3/rkcirmstr/S0LARIS/local/tests/olb/lohi/t0 . 0 

125-1 . sh 

sleep 10 

125-2. sh 

sleep 10 

500-3. sh 

sleep 10 

500-4. sh 

sleep 30 

parselog.pl 

collect .pi 
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cd /users/work3/rkarmstr/SOLARIS/local/tests/olb/lolo/tO. 0 

125-1. sh 

sleep 10 

125-2. sh 

sleep 10 

500-3. sh 

sleep 10 

500-4. sh 

sleep 30 

parselog.pl 

collect.pl 

cd /users/work3/rkarmstr/S0LARIS/local/tests/olb/linear/t0 . 0 

125-1. sh 

sleep 10 

125-2. sh 

sleep 10 

500-3. sh 

sleep 10 

500-4. sh 

sleep 30 

parselog.pl 

collect .pi 

# Iba tests 

mail rkarmstr < /users/work3/rkarmstr/S0LARIS/local/tests/mmlba 

cd /users/work3/rkarmstr/S0LARIS/local/tests/lba/hihi/t0 . 0 

125-1. sh 

sleep 10 

125-2. sh 

sleep 10 

500-3. sh 

sleep 10 

500-4. sh 

sleep 30 

parselog.pl 

collect.pl 

cd /users/work3/rkarmstr/S0LARIS/local/tests/lba/hilo/t0 . 0 

125-1. sh 

sleep 10 

125-2. sh 

sleep 10 

500-3. sh 
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sleep 10 
500-4. sh 
sleep 30 
parselog.pl 
collect .pi 

cd /users/work3/rkarmstr/S0LARIS/local/tests/lba/lohi/t0.0 

125-1. sh 

sleep 10 

125-2. sh 

sleep 10 

500-3. sh 

sleep 10 

500-4. sh 

sleep 30 

parselog.pl 

collect .pi 

cd /users/work3/rkarmstr/S0LARIS/local/tests/lba/lolo/t0. 0 

125-1. sh 

sleep 10 

125-2. sh 

sleep 10 

500-3. sh 

sleep 10 

500-4. sh 

sleep 30 

pairselog.pl 

collect.pl 

cd /users/work3/rkarmstr/S0LARIS/local/tests/lba/linear/t0 .0 

125-1. sh 

sleep 10 

125-2. sh 

sleep 10 

500-3. sh 

sleep 10 

500-4. sh 

sleep 30 

parselog.pl 

collect .pi 

# greedy tests 

mail rkarmstr < /users/work3/rkaLrmstr/S0LARIS/local/tests/mmgreedy 
cd / users/work3/ rkeirmstr/SOLARIS/local/ tests/ greedy/hihi/ tO . 0 
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125-l.sh 
sleep 10 
125-2. sh 
sleep 10 
500-3. sh 
sleep 10 
500-4. sh 
sleep 30 
parselog.pl 
collect .pi 

cd /users/work3/rkarmstr/S0LARIS/local/tests/greedy/hilo/t0 . 0 

125-l.sh 

sleep 10 

125-2. sh 

sleep 10 

500-3 . sh 

sleep 10 

500-4. sh 

sleep 30 

parselog.pl 

collect.pl 

cd /users/work3/rkarmstr/SDLARIS/local/tests/greedy/lohi/tO . 0 

125-l.sh 

sleep 10 

125-2. sh 

sleep 10 

500-3. sh 

sleep 10 

500-4. sh 

sleep 30 

parselog.pl 

collect .pi 

cd /users/work3/rkajrmstr/S0LARIS/local/tests/greedy/lolo/t0 . 0 

125-l.sh 

sleep 10 

125-2. sh 

sleep 10 

500-3. sh 

sleep 10 

500-4. sh 

sleep 30 

pairselog.pl 
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collect .pi 

cd /users/work3/rkarmstr/S0LARIS/local/tests/greedy/linear/t0 .0 

125-1. sh 

sleep 10 

125-2. sh 

sleep 10 

500-3 . sh 

sleep 10 

500-4. sh 

sleep 30 

parselog.pl 

collect .pi 

• fastgreedy tests 

mail rkcirmstr < /users/work3/rkeirmstr/S0LARIS/local/tests/inmf astgreedy 

cd /users/work3/rkarmstr/S0LARIS/local/tests/f astgreedy/hihi/tO . 0 

125-1. sh 

sleep 10 

125-2. sh 

sleep 10 

500-3. sh 

sleep 10 

500-4. sh 

sleep 30 

parselog.pl 

collect .pi 

cd /users/work3/rkarmstr/S0LARIS/local/tests/f astgreedy /hi lo/tO . 0 

125-1 .sh 

sleep 10 

125-2. sh 

sleep 10 

500-3. sh 

sleep 10 

500-4. sh 

sleep 30 

parselog.pl 

collect .pi 

cd /users/work3/rkarmstr/S0LARIS/local/tests/f astgreedy/lohi/tO . 0 

125-1. sh 

sleep 10 

125-2. sh 

sleep 10 
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500-3. sh 
sleep 10 
500-4. sh 
sleep 30 
parselog . pi 
collect .pi 

cd / users/work3/ rkarmstr/ SOLARIS/ local/tests/f astgreedy /lolo/ tO . 0 

125-1 . sh 

sleep 10 

125-2. sh 

sleep 10 

500-3. sh 

sleep 10 

500-4. sh 

sleep 30 

parselog.pl 

collect .pi 

cd /users/work3/rkarmstr/S0LARIS/local/tests/f astgreedy/linear/tO . 0 

125-1. sh 

sleep 10 

125-2. sh 

sleep 10 

500-3. sh 

sleep 10 

500-4. sh 

sleep 30 

paxselog.pl 

collect.pl 

mail rkarmstr < /users/work3/rkarmstr/S0LARIS/local/tests/mmdone 
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7. EXAMPLE PARSE SCRIPTS 

a. Parsing Run-Time Data From Log Files: parselog.pl 

# ! /bin/perl 

# This Perl script is meant to run on version Perl 5.0. 

# Perl 5 is loaded onto virgo. 

# This script is written for 0 vciriance tests. That is why 

# it only looks for 15 repetitions of the logfile. For tests 

# where you run SmartNet more them once for each command file, 

# you need to change the "1" to "15" or whatever number 

# of reps you run. See the note at each place needing change. 

use Cwd; 

while (<+.log>) { 
chmod 0600, $_; 

> 

©files = ("testl25-l", "testl25-2", "test500-3" , "test500-4") ; 

for ($yy = 0; $yy < 4; $yy++) { 

openCOUT, ">pcirse-@f iles [$yy] .log"),* 

print OUT "Data parsed from f ile : \tQf iles [$yy] . log\n\n\n" ; 

$dir = cwdO; 

print OUT "Output from directory :\n\t$dir\n\n" ; 

$sum = 0; 

for ($ix = 0; $ix < 1; $ix++) { ##Need to change the "1" to "15" normally 
$iy = Six + 1; 

$aa = 0; 

Sflag = 0; 

Scount = 0; 

Smachinel = 0; 

$machine2 = 0; 

SmachineS = 0; 

$machine4 = 0; 

$machine5 = 0; 

SmachineS = 0; 

Smachine? = 0; 

SmachineS = 0; 

SmachineS = 0 ; 

SmachinelO = 0; 

Sjobl = 0; 
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$job2 = 0; 

$job3 = 0; 

$job4 = 0; 

$job5 = 0; 

open(IN, "®f lies [$yy]-$iy. log") or die "Can't open ©files [$yy] -$iy . log\n" 
while ($line = <IN>) { 

($one, $two, $three, $four, $five, $six) = split(" ", $line) ; 
if (($one eq "SCHED") && ($flag == 0) ) { 
if($four eq "host<machinel>") { 

$machinel++; 

)elsif ($four eq "host<machine2>") { 

$machine2++; 

}elsif ($four eq "host<machine3>") { 

$machine3++; 

}elsif ($four eq "host<machine4>") { 

$machine4++; 

}elsif ($four eq "host<machine5>") { 

$machine5++; 

}elsif ($four eq "host<machine6>") { 

$machine6++; 

}elsif ($four eq "host<machine7>") { 

$machine7++; 

}elsif ($four eq "host<machine8>") { 

$machine8++; 

}elsif ($four eq "host<machine9>") { 

$machine9++; 

}elsif ($four eq "host<machinelO>") { 

$machinelO++; 

} 

} 

if (($one eq "SCHED") kk ($flag == 0) ) { 
if($five eq "model<jobl>") { 

$jobl++ ; 

}elsif ($five eq "model<job2>") { 

$job2++ ; 

}elsif ($five eq "model<job3>") { 

$job3++; 

}elsif ($five eq "model<job4>") { 

$job4++ ; 

}elsif ($five eq "model<job5>") { 

$ j ob5++ ; 
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} 

} 

if (($one eq "START") && ($flag == 0) ) { 
$three =" s/time<//g; 

$ three =" s/>//g; 

$start[$ix] = $three; 

print OUT "Run $iy; stairt : \t\t$start [$ix] \n" ; 
$flag = 1; 

} 



if ($one eq "DONE") { 

$count++ ; 

if(($count == 125) and ($yy < 2)) { 

$three =" s/time<//g; 

Sthree =" s/>//g; 

$end[$ix] = $three; 

print OUT "Run $iy: end:\t\t$end[$ix]\n" ; 

}elsif (($count == 500) and ($yy > 1)) { 

Sthree =" s/time<//g; 

Sthree =" s/>//g; 

Send [Six] = Sthree; 

print OUT "Run Siy: end:\t\tSend[Six]\n" ; 



} 

} 

SdurationCSix] = Send [Six] - Sstart[Six]; 
print OUT "DURATION for Run Siy is: Sdurat 

Ssum = Ssum + ©duration [Six] ; 
close IN; 

print OUT "Number of machinel assignments: 
print OUT "Number of machine2 assignments; 
print OUT "Number of machines assignments; 
print OUT "Number of machine4 assignments: 
print OUT "Number of machine5 assignments: 
print OUT "Number of machines assignments: 
print OUT "Number of machine? assignments: 
print OUT "Number of machines assignments; 
print OUT "Number of machines assignments; 
print OUT "Number of machinelO assignments; 



ion [$ix] \n\n" ; 



$machinel\n" 
$machine2\n" 
$machine3\n" 
$machine4\n" 
$machine5\n" 
$machineS\n" 
$machine7\n" 
$machineS\n" 
$machine9\n" 
$machinelO\n\n" ; 



print OUT "Number of jobl assignments: 
print OUT "Number of job2 assignments: 
print OUT "Number of jobS assignments: 



$jobl\n" ; 
$job2\n"; 
$job3\n" ; 
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print OUT "Number of job4 assignments: $job4\n"; 

print OUT "Number of job5 assignments: $job5\n\n"; 

} 



$average = $sum/l; ## Need to change to "15" normally 
print OUT "\nAverage runtime for Qfiles[$yy] is: $average\n"; 

close OUT; 
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b. Collecting Run-Time Data 

# ! /bin/perl 
use Cwd; 

©files = <parse-*.log>; 

$dir = cwd() ; 

($first, $users, $work3, $rkarmstr, $solaris, $here, $tests, $algorithm, $hetero| 

Six = 0; 

open(0UT, ">$algorithm. collect") or die "Cannot open Salgorithm. collect \n" ; 
print OUT "Algorithm :\t$algorithm\nHeterogeneity;\t$heterogeneity\nTest run:\t$v< 
while (©files [Six] ){ 

open(IN, (shift ©files)) or die "Can't open (shift ©files)\n"; 
while (<IN>) { 

if(/Average runtime for test ( [0-9 .]+)-( [0-9] ) is: *([0-9.]+)/) i 

Saverage = S3; 

print OUT "The average runtime for testSl-S2 is: Saverage \n"; 

} 

} 



close(IN) ; 



} 

close(OUT) ; 
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